You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Mister Donut <la...@gmail.com> on 2009/02/09 05:52:39 UTC

The Blog

So.

The blog.

Store the comments inside the blog entry.

At the same time User A and B post a comment.
A's or B's comment disappears.
Use the _rev and some fancy scripts to merge the two versions.
And pray there hasn't been a _compact in between.
Doesn't sound so great. Also, 1k.

Store the comment seperately.

Great! Views, here I come! All problems solved!

Wait. What if there are ten thousand comments.
Maybe we would like to count them!
A counter. Not possible, son.
MapReduce! We don't need live counter updates.
But we still need to store the counter separately.
Twenty blog entries on one page, twenty queries.
Who cares! It's really fast!
This is getting exciting.

Pagination!
Let's see. We can implement forward and backward easily.
But this is getting really annoying if we want move fast.
Like, see comments starting from two weeks ago. Or posts!
Sadly, we cannot guess these dates.

So, we put the posts into groups, based on date.
Show me all comments from two weeks ago!
Great idea!
But, son. That ain't possible. Unless we use a lot of _rev.
Doesn't sound so great anymore.

Sob. Let's check the website.
CouchDB.
Documentation.
Introduction.
What it is Not.
A seamless persistence layer for an OO programming language.

But! It told me it can manage my documents!

So, what exactly can you do with CouchDB?

It seems it is absolutely impossible to use CouchDB for anything
that allows users to contribute to a document at the same time.

Unless you use a lot of silly _rev.
Which is exactly why RDBMS were built?

Hell. bulk_docs. Isn't that exactly what CouchDB is NOT all about?

Sooner or later you'll have someone suggesting to implement...
...bulk_save!

I am really really lost.

Re: The Blog

Posted by Patrick Aljord <pa...@gmail.com>.
On Sun, Feb 8, 2009 at 11:52 PM, Mister Donut <la...@gmail.com> wrote:
> So.
>
> The blog.
>
> Store the comments inside the blog entry.
>

Bad idea, store comments in their own docs.

> Great! Views, here I come! All problems solved!
>
> Wait. What if there are ten thousand comments.
> Maybe we would like to count them!

Sure, use the view to get all comments and specify count=0.

> Pagination!
> Let's see. We can implement forward and backward easily.
> But this is getting really annoying if we want move fast.
> Like, see comments starting from two weeks ago. Or posts!
> Sadly, we cannot guess these dates.

startkey, endkey and limit.
> So, what exactly can you do with CouchDB?
>

Anything!

> Sooner or later you'll have someone suggesting to implement...
> ...bulk_save!

It's already there.

Make sure you read the docs and ask #couchdb, mailing list is good too.

Re: The Blog

Posted by Jan Lehnardt <ja...@apache.org>.
Hi Mister Donout,

It be easier to follow your thoughts, if you would take the time
to formulate them in complete sentences. Your offensive tone
doesn't help either.

For pagination, check this mailing list's archives for several
solutions.

http://mail-archives.apache.org/mod_mbox/couchdb-user/
http://mail-archives.apache.org/mod_mbox/incubator-couchdb-user/

For the book: We're 4 chapters into it, there's more material
on the way that answers your questions. In the meantime,
we appreciate any constructive criticism on the book's
mailing list:

http://groups.google.com/group/couchdb-relax

For the wiki, grab an account and fix what's unhelpful. Or
at least, file a bug report in JIRA so we are aware of any
specific issue in the wiki. (Yes, the wiki is not perfect, but
it is open for all to edit).

https://issues.apache.org/jira/browse/COUCHDB

More comments inline.

On 9 Feb 2009, at 11:28, Mister Donut wrote:
>
> I don't think you understand my point.
> Yes, I know. Maybe you should re-read.

Maybe you should rephrase? Just a suggestion. Communication
is two-way.


> You still need one lookup for every blog entry on a page.
> And there is no way you can ever store the comment count inside the  
> blog entry.
>
>> startkey, endkey and limit.
>
> That sounds so great. But wait. LIMIT.
> I know that from SQL. It doesn't scale.
> Jumping to page 1234567 of ten million. Please, no.

limit is documented to be slow and should not be used to
implement paging. That's what startkey and endkey are
for and the mailing list archives show you how that is done.



> And you cannot, ever, group items based on a variable criteria.
> For example in batches of around one hundred.
> Which solves pagination.
> A view cannot provide that. But again, there is no way you can  
> "manually" do it.

Which is not the only solution to pagination.


>> Anything!
>
> I challenge you. Build me a counter!
> No seriously.
>
> Pick one:
> GROUP BY, LIMIT, _rev to fake transactions. Uh. Oh. Hello SQL?
> You know something magical that allows you to avoid _rev.

See CAP*, RDBMSs pick consistency and availability where CouchDB
picks availability and partitionability.

* Consistency, Availability, Partitioning, pick two.


> So please tell me.

CouchDB is not an RDBMS, you need to change your thinking to
solve the things that you are accustomed to solve in RDBMS land.
It took me quite a while. <insert hammer-nail-tool-analogy>. If
you try to find 1:1 mappings of solutions from the RDBMS world
to CouchDB, you must conclude that CouchDB sucks. But that
doesn't mean that CouchDB sucks. I hope we can provide
you with enough information about how to do things in CouchDB
land.

Thanks for asking hard questions, would be great if we could
keep this dialogue going.

Cheers
Jan
--

> What exactly can I use CouchDB for that uses its strengths,
> and not weaknesses? I am honestly not trying to make a fool of anyone.
> The CouchDB book seems only to justify the design choices.

> The Wiki is completely unhelpful.
> Every time it gets interesting.
> http://wiki.apache.org/couchdb/How_to_implement_tagging
> It stops. That second article, that would certainly enlighten me.
> Yes, how, would you go about implementing that?
>
> I don't see how you can make it work but by using an awful lot of  
> merging logic.
> Isn't that why you use a RDBMS in the first place?
>
>> It's already there.
>
> You're right, I missed that one. It's even more scary.


Re: The Blog

Posted by Chris Anderson <jc...@apache.org>.
On Mon, Feb 9, 2009 at 5:45 PM, Patrick Aljord <pa...@gmail.com> wrote:
> On Mon, Feb 9, 2009 at 5:28 AM, Mister Donut <la...@gmail.com> wrote:
>>> startkey, endkey and limit.
>>
>> That sounds so great. But wait. LIMIT.
>> I know that from SQL. It doesn't scale.
>
> That's what caching is for.

I should clear up that limit performs just fine on CouchDB. Startkey
and limit is just as performant as startkey and endkey.

Its large values of skip that you should avoid.

-- 
Chris Anderson
http://jchris.mfdz.com

Re: The Blog

Posted by Patrick Aljord <pa...@gmail.com>.
On Mon, Feb 9, 2009 at 5:28 AM, Mister Donut <la...@gmail.com> wrote:
> I don't think you understand my point.
> Yes, I know. Maybe you should re-read.
> You still need one lookup for every blog entry on a page.

No, you can do so with only one query using Map/Reduce.


>> startkey, endkey and limit.
>
> That sounds so great. But wait. LIMIT.
> I know that from SQL. It doesn't scale.

That's what caching is for.

> Jumping to page 1234567 of ten million. Please, no.
>

What's the point of that? Unless you expect your number of comments to
never grow, in which case caching will do.


> And you cannot, ever, group items based on a variable criteria.

You can do that with Map/Reduce.

> I challenge you. Build me a counter!

Create a view that gets all the comments and get them with limit=0,
there's your counter.

Re: The Blog

Posted by Jan Lehnardt <ja...@apache.org>.
On 9 Feb 2009, at 12:38, Mister Donut wrote:

>> It be easier to follow your thoughts, if you would take the time
>> to formulate them in complete sentences. Your offensive tone
>> doesn't help either.
>
> I don't know where I am being offensive, but I don't like when people
> just assume that I stupid or haven't done my research.

I never said you were stupid. Your mails quote prove the opposite.
I implied you haven't done your research, which you have. Sorry,
no offence meant. Thanks for being constructive here! :)

> Because, yes, I
> have searched for pagination, and yes, I have read the Wiki, and yes,
> I realize that CouchDB isn't a RDBMS, and no, I am not trying to put
> my RDBMS ideas into CouchDB. And this isn't mean offensive, but rather
> to show you that I really tried, to my best knowledge. I know what
> MapReduce is. I know what Key/Value pairs are. I know how CouchDB puts
> them together (the idea, I am not an Erland hacker).

I didn't imply that you don't know about any of the above. I was
restating my personal and a commonly observed pattern that it
takes a bit of getting used to the CouchDB way and using analog
SQLisms rather hinder understanding CouchDB. I'm glad that
this is not the case with you.


>> limit is documented to be slow and should not be used to
>> implement paging. That's what startkey and endkey are
>> for and the mailing list archives show you how that is done.
>> Which is not the only solution to pagination.
>
> Startkey and Endkey allow you to implement "Back" and "Forward". I
> wrote that. They do not allow you to jump directly somewhere. For
> that, you'd somehow need to store start- and endkeys, but the nature
> of CouchDB doesn't allow you to. Unless I am missing a point. Amazon
> SimpleDB, and Google WebApps (sp?) suffer from the same problem, if
> you read their forums. Basically they tell you: "Well, it doesn't
> work".

CouchDB won't allow you to "jump to page X", but if you look at
e.g. Google, it doesn't work either. (You often get 10+ pages
of results shown on the first page and if you go to the second
you end up only seeing 3 follow up pages). You can jump to
a page if you are using a sequence as your key in a view which
would be analogous to how an RDBMS would do it. But surrogate
keys are considered harmful and I'd say (but that really depends
on the application), not very helpful. The problem with sequences
in a distributed database is that you can't create them reliably (CAP
again), that's why SimpleDB and BigTable won't do it.


>> See CAP*, RDBMSs pick consistency and availability where CouchDB
>> picks availability and partitionability.
>> * Consistency, Availability, Partitioning, pick two.
>
> I know, as I said, I read the book. Not trying to be offensive.
> What application benefits from picking Partitionability and
> Availability? And can work with a lack of Consistency?
>

CouchDB is eventual consistent (in the partitioned case). The
benefit from allowing for partitioning is that you can build systems
with reliable fault tolerance and load balancing (or scaling if you
will) that use more than single machine.


> You cannot store any kind of counters, or duplicate data, basically
> anything that needs to be the same as something else.

Can you elaborate on that? I don't quote get the "or duplicate data,
basically anything that needs to be the same as something else" bit.


>> If you try to find 1:1 mappings of solutions from the RDBMS world
>> to CouchDB, you must conclude that CouchDB sucks. But that
>> doesn't mean that CouchDB sucks.
>
> What I am trying to find, is an example, that, by using CouchDB,
> offers an unique solution.
> All examples of CouchDB I have seen so far, are either so simple, they
> don't show what CouchDB is all about (probably the author doesn't get
> it, not trying to be offensive, but posts like "10 Reasons Why CouchDB
> Is Better Than (Random RDBMS)"... well.) or they try to solve a
> problem that has already been solved by RDBMS.

CouchDB solves the similar problems as an RDBMS, but starting
from a different angle (distributed operation instead of single-node
operation, CAP, yadda yadda).


The 10 reasons post is a little populist, granted :)


> Availability (well, yeah, I don't know what to say).
> Partitionability:
> The Replication feature is really interesting. It is intriguing. It
> makes me want to make something with. But I just cannot find anything
> that could "exploit" this very feature.

A couple of things you can do with CouchDB replication (again, not  
saying,
that you can't do some of those with an RDBMS but it is getting harder
the further you move down the list):

  - Build a hot-spare failover clone of a live database server.
  - Build a set of read-only nodes for spreading read load.
  - Build an N-master database cluster that is eventually consistent,
    can linearly scale writes by adding more nodes and protects
    against machine failure.
-  The above, but with geographic distribution.
-  Build a distributed p2p application where each user has a local,
    off-line copy of her data (to allow for availability without a  
network
    connection and reducing latency) that can exchange data with
    other users using replication when online.


Cheers
Jan
--


Re: The Blog

Posted by Antony Blakey <an...@gmail.com>.
On 09/02/2009, at 11:19 PM, Mister Donut wrote:

> I would just, really really really, like to see an example that goes
> beyond schema-free. That handles replication. I think that would show
> where CouchDB shines, and where you'd fail with a RDBMS.

I presume you've seen http://wiki.apache.org/couchdb/ 
CouchDB_in_the_wild.

CouchDB is useful even where you wouldn't fail with an RDBMS. The data  
model, HTTP API, and map/reduce materialised views are features that  
makes some applications considerable easier to conceptualise, design,  
write, deploy and manage, and that's ignoring replication.

Considering replication, I'm about to deploy an application platform  
that uses replication to distribute read-only data *and* applications  
that use that data in a p2p mesh. I don't know how I would do that  
using an RDBMS - I guess I'd end up replicating CouchDB's mechanism in  
some form.

A putative extension to that gig is to make it into a p2p read/write  
collaborative content development platform. Once again, CouchDB  
replication to the rescue.

I have another contract about to start for a server app where all the  
data is maintained on the client's desktop, previewed with full  
functionality, and then replicated to an EC2 instance. This can be  
done with traditional databases, but it's trivial with CouchDB, which  
has allowed me to both outcompete on price and improve my development  
margin. That's one definition of success.

Furthermore, in each of these gigs, having the content in JSON is  
amazingly convenient. In some cases I need joins and dynamic queries,  
but I can do that using Couch's _external mechanism that allows  
alternative indexing.

In one case I've gone through three delivered versions, from Java/ 
Spring/Hibernate/Postgresql, to VisualWorks Smalltalk/GLORP/Postgresql  
to CouchDB/Merb/Ruby. IMO Ruby/Smalltalk are to Java as CouchDB is to  
an RDBMS. I never want to go back. Well, apart from going back to  
Smalltalk.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

If a train station is a place where a train stops, what's a workstation?



Re: The Blog

Posted by Jan Lehnardt <ja...@apache.org>.
On 9 Feb 2009, at 13:49, Mister Donut wrote:

> I'll jump right in.
>
>> CouchDB won't allow you to "jump to page X", but if you look at
>> e.g. Google, it doesn't work either. [...]
>> But surrogate keys are considered harmful and I'd say (but that
>> really depends on the application), not very helpful.
>
> I guess I was assuming that CouchDB, due to its different nature, has
> a sophisticated solution for this. But apparently pagination is a
> problem that is really hard to solve.

CouchDB in its current form is very bare bones. Many of us are not
as experienced in CouchDB as in other RDBMS, just because CouchDB
hasn't been around that long. We've came a long way defining standard
patterns of how to solve common problems in CouchDB, but there's
a lot more to do.

Pagination lives and dies by being able to calculate which row lives on
which page. This works best with a surrogate index that is a sequence
over the rows. You can build a sequence on a distributed system, but
you are introducing a global queue that all requests to your system  
would
need to go through. You'd need to make that global queue fault tolerant
and able to hand all your load. This is tricky. Or you give up on that
and accept that sequences in a distributed systems are not feasible.
This is where pagination gets indeed gets hard. But if you work  
backwards
from the user experience, you can make a decent trade-off, see Google.


>> Can you elaborate on that? I don't quote get the "or duplicate data,
>> basically anything that needs to be the same as something else" bit.
>
> Well. Let's say you have a list of documents. You want to store some
> information about the newest document in a separate key (instead of a
> view, which might be slow? if you have too many).

Too many what? Views or documents? Views are not really slow once
the index is built and with incremental updates, not in production  
either.
Having many views is no problem either as they are evaluated on-read,
not on document-write (unlike traditional RDBMS column indexes).


> That isn't possible.
> Or let's say you have documents, and categories. And many, many, many
> of them. Again, the view to show the latest document might be too
> slow, so you want to save that information in a separate key. Not
> possible.

Once a view is built, it is rather quick to look up things. I'm  
suspecting here
that you assume that views are created on demand, based on user-input.
This is not something that would work except for very few documents and
you're advised to find a solution with predefined views.


>> A couple of things you can do with CouchDB replication (again, not   
>> saying,
>> that you can't do some of those with an RDBMS but it is getting  
>> harder
>> the further you move down the list): [...]
>
> Thank you for that list. I think, and like many other users,
> considering what I have read in blogs, seem to expect something else
> from CouchDB. I am not so sure where this is coming from.
>
> Check the Ruby thing a few mails down. How exactly is that
> implementation going to work without immediate consistency?

Paul's Stuffing is for people who want to get going with CouchDB quickly
in their rails environment. It is specifically not designed around all  
concepts
of CouchDB.


> Everyone
> seems to be going on about it being schema free, but you can just add
> a "param" field to any database and transparently (un)serialize and
> there you have it, schema-free.

Your alter-table statement locks your table (in MySQL). If you normalize
that out into a separate table, you add a JOIN which might end up not
being as fast as you like. Totally generic object behaviour abstractions
in SQL need something like 8 tables, there's no way this flies :)


> If you actually have a few nodes (with that implementation), it will
> break big, big time.

How? (Assuming you have a use-case in mind, can you explain that?)


> I think, possibly,
> with the "Cloud Hype", that I got into believing, that it will "just
> work". With anything that you throw at it. Like what Amazon SimpleDB
> tells you it would.

There is no magic bullet. Distributed programming is hard. :)


> Yes, Key/Value pairs are incredibly easy. MapReduce is amazing and
> intriguing. But handling the replication, won't it be so difficult
> that you end up with a Quasi-Mini-RDBMS anyway?

What is a quasi-mini-RDBMS? Of course, concepts and behaviour
will likely overlap, but there are a number of properties that draws
people to CouchDB. The REST API is one thing. JSON another.
Replication yet another and the Erlang core another, another.

Speaking of which, Erlang is pretty cool for multi-core systems
that are rather hard to program in other languages (yet again,
no silver bullet).


> Now I got far away from my original questions, but I guess that
> happens often in discussions.
>
> Basically, now: "Is it possible to handle the replication in such a
> way that you don't end up with a Mini-RDBMS anyway in the end?"

Again, can you wrap that into a concrete example, I don't quite get what
that mini-RDBMS is and how your understanding of replication ties
into that :)


> I would just, really really really, like to see an example that goes
> beyond schema-free. That handles replication. I think that would show
> where CouchDB shines, and where you'd fail with a RDBMS.

See the last three items on the list in the last mail. They are  
traditionally
not easy to build on top of an RDBMS in a practical or scalable manner.

Cheers
Jan
--


Re: [user] Re: The Blog

Posted by Wout Mertens <wm...@cisco.com>.
On Feb 9, 2009, at 4:50 PM, Jan Lehnardt wrote:

> On 9 Feb 2009, at 16:18, Wout Mertens wrote:
>
>> On Feb 9, 2009, at 3:57 PM, Noah Slater wrote:
>>
>>> On Mon, Feb 09, 2009 at 09:51:18AM -0500, Adam Petty wrote:
>>>> Could this thread be added to the wiki - with only minor editing  
>>>> for length
>>>> - maybe as "a RDBMS vs couch 'Discussion' ?"  or something  
>>>> similar?"...
>>>
>>> We've learnt from the book that such comparisons tend to be harmful.
>>>
>>> They lead people into thinking that there is a direct meaningful  
>>> comparison.
>>>
>>> Fundamentally, CouchDB and RDMS solve different problems.
>>
>> I dunno, I think it would be interesting to compare the main  
>> benefits of each so that you know what the strong points of each are.
>
> Quoting myself from a few mails ago:
>
>> CouchDB solves the[sic] similar problems as an RDBMS, but starting
>> from a different angle (distributed operation instead of single-node
>> operation, CAP, yadda yadda).
>
> The differences as a consequence of different CAP-feature priorization
> would be interesting, but then, I had hoped that the "Eventual  
> Consistency"*
> chapter had done that.
>
> * http://books.couchdb.org/relax/eventual-consistency

Funny how nobody is really upset about the schema missing versus  
RDBMS. It seems like schemas are a lot like static typing in  
programming languages - no matter how you look at it, there are a  
*lot* of people doing just fine without it.

I remember taking Databases at university and we had to read this book  
(forgot which one but it's pretty much the standard) and the schema  
and relational concepts are very intertwined. I really liked the  
theory where you would make a fully decomposed DB schema and then used  
queries to tie everything together. The thing I liked best was  
writeable queries, basically if you are careful you can automatically  
update the underlying tables except in some edge cases, thanks to a  
strict schema. However, as far as I know no current RDBMS implements  
this. Maybe there isn't much use for schemas. Non-trivial applications  
can't rely on just schemas for data validation.

A thought occurred: If you are consistent in interfacing with your  
database using only views, then you have created an ad-hoc schema.  
This is probably old news to most of you but it felt nice realizing  
it ;-)

So basically I'm saying maybe schema-free should be touted more as a  
nice-to-have feature?

Wout.


Re: [user] Re: The Blog

Posted by Jan Lehnardt <ja...@apache.org>.
On 9 Feb 2009, at 16:18, Wout Mertens wrote:

> On Feb 9, 2009, at 3:57 PM, Noah Slater wrote:
>
>> On Mon, Feb 09, 2009 at 09:51:18AM -0500, Adam Petty wrote:
>>> Could this thread be added to the wiki - with only minor editing  
>>> for length
>>> - maybe as "a RDBMS vs couch 'Discussion' ?"  or something  
>>> similar?"...
>>
>> We've learnt from the book that such comparisons tend to be harmful.
>>
>> They lead people into thinking that there is a direct meaningful  
>> comparison.
>>
>> Fundamentally, CouchDB and RDMS solve different problems.
>
> I dunno, I think it would be interesting to compare the main  
> benefits of each so that you know what the strong points of each are.

Quoting myself from a few mails ago:

> CouchDB solves the[sic] similar problems as an RDBMS, but starting
> from a different angle (distributed operation instead of single-node
> operation, CAP, yadda yadda).

The differences as a consequence of different CAP-feature priorization
would be interesting, but then, I had hoped that the "Eventual  
Consistency"*
chapter had done that.

* http://books.couchdb.org/relax/eventual-consistency

Cheers
Jan
--






Re: [user] Re: The Blog

Posted by Jan Lehnardt <ja...@apache.org>.
On 9 Feb 2009, at 18:28, Paul Davis wrote:

> On Mon, Feb 9, 2009 at 11:43 AM, Adam Petty <ad...@gmail.com>  
> wrote:
>> If a hospital starts to move to an SOA - and all the business logic  
>> gets
>> abstracted to the web services that each department exposes in the  
>> use,
>> transmission (security) of data... wouldn't that then become almost  
>> the
>> perfect place for couch?
>>
>> Now if it's already in Oracle - I can see how it might not be smart  
>> to
>> retool just for couch, but starting from scratch, I can't think of  
>> anything
>> that goes on in a hospital that doesn't revolve around physical  
>> documents.
>>
>> I've done consulting for medical records processing companies - but  
>> not for
>> hospitals themselves - any specifics as to why this wouldn't meet
>> requirements?
>>
>>
>
> I saw the schema for an Oracle db used in production at a teaching
> hospital. It still gives me nightmares. This was the general business
> database not related to things like patient records etc. Ie, it tracks
> which doctor is on what rotations in which department. AFAIK, it
> basically ran everything *except* patient records. Personally I'd
> probably pick up a commercial solution for things like medical records
> for liability alone.

I think for the sake of the argument we're assuming that we're building
a new commercial solution on top of CouchDB.


> The larger picture here though is when business logic and consistency
> are more important than availability and partition tolerance. These
> are systems that are designed to be run on a single machine with
> perhaps a hot spare etc.

A double-write proxy in front of the master & hot spare and not using
replication gives you CA!P instead of !CAP. CouchDB lets you even
make the choice. It is moar bettar! :)

Cheers
Jan
--

> In other words, these are not your Web 2.0 "Let's have an ORMgy!"  
> databases.
>
>>
>>
>> On Mon, Feb 9, 2009 at 11:12 AM, Paul Davis <paul.joseph.davis@gmail.com 
>> >wrote:
>>
>>> On Mon, Feb 9, 2009 at 10:18 AM, Wout Mertens <wm...@cisco.com>  
>>> wrote:
>>>> On Feb 9, 2009, at 3:57 PM, Noah Slater wrote:
>>>>
>>>>> On Mon, Feb 09, 2009 at 09:51:18AM -0500, Adam Petty wrote:
>>>>>>
>>>>>> Could this thread be added to the wiki - with only minor  
>>>>>> editing for
>>>>>> length
>>>>>> - maybe as "a RDBMS vs couch 'Discussion' ?"  or something  
>>>>>> similar?"...
>>>>>
>>>>> We've learnt from the book that such comparisons tend to be  
>>>>> harmful.
>>>>>
>>>>> They lead people into thinking that there is a direct meaningful
>>>>> comparison.
>>>>>
>>>>> Fundamentally, CouchDB and RDMS solve different problems.
>>>>
>>>> I dunno, I think it would be interesting to compare the main  
>>>> benefits of
>>>> each so that you know what the strong points of each are.
>>>>
>>>> For example, suppose you implement schema-free in an RDBMS by  
>>>> adding a
>>> text
>>>> field that contains a JSON string. You still keep some of the  
>>>> metadata,
>>> like
>>>> _rev and _id, in proper fields.
>>>>
>>>
>>> If you stuff a structured string into an RDBMS you're Doing It Wrong
>>> &trade;.
>>>
>>>> However, thinking about that, it means you will need to re- 
>>>> implement
>>>> everything CouchDB does, like views and replication.
>>>>
>>>> To be honest, I think saying RDBMS and CouchDB are for different
>>> solutions
>>>> is just you guys being nice. I think that any application would  
>>>> benefit
>>> from
>>>> using the CouchDB model and only in very specific, very demanding  
>>>> cases
>>> an
>>>> RDBMS would be better. I can't think of any examples though.
>>>>
>>>> So here's my challenge to the mailing list, it's pretty much the  
>>>> same one
>>>> that MrDonut posted: Give us an example of something that would  
>>>> be better
>>> be
>>>> done with an RDBMS and something that would better be done with  
>>>> CouchDB.
>>>>
>>>> I'll help you: I think it would be easier to create a wiki with  
>>>> CouchDB
>>> than
>>>> with an RDBMS. It is possible in both but CouchDB just makes it  
>>>> easier. I
>>>> suppose we'd have to ask the http://couch.it guys to know if that's
>>> true.
>>>>
>>>> I don't know what would be done better in an RDBMS. Performance  
>>>> logging
>>>> perhaps? Something with really stringent schema requirements?
>>>>
>>>> Wout.
>>>>
>>>
>>> Things that CouchDB is better at:
>>>
>>> The interweb.
>>>
>>> Things that an RDBMS is better at:
>>>
>>> Huge amounts of business logic. As in the Oracle install running  
>>> your
>>> favorite hospital. Think along the lines of 10's and 100's of
>>> thousands of lines of app logic in the DB itself.
>>>
>>> HTH,
>>> Paul Davis
>>>
>>
>


Re: [user] Re: The Blog

Posted by Paul Davis <pa...@gmail.com>.
On Mon, Feb 9, 2009 at 11:43 AM, Adam Petty <ad...@gmail.com> wrote:
> If a hospital starts to move to an SOA - and all the business logic gets
> abstracted to the web services that each department exposes in the use,
> transmission (security) of data... wouldn't that then become almost the
> perfect place for couch?
>
> Now if it's already in Oracle - I can see how it might not be smart to
> retool just for couch, but starting from scratch, I can't think of anything
> that goes on in a hospital that doesn't revolve around physical documents.
>
> I've done consulting for medical records processing companies - but not for
> hospitals themselves - any specifics as to why this wouldn't meet
> requirements?
>
>

I saw the schema for an Oracle db used in production at a teaching
hospital. It still gives me nightmares. This was the general business
database not related to things like patient records etc. Ie, it tracks
which doctor is on what rotations in which department. AFAIK, it
basically ran everything *except* patient records. Personally I'd
probably pick up a commercial solution for things like medical records
for liability alone.

The larger picture here though is when business logic and consistency
are more important than availability and partition tolerance. These
are systems that are designed to be run on a single machine with
perhaps a hot spare etc.

In other words, these are not your Web 2.0 "Let's have an ORMgy!" databases.

>
>
> On Mon, Feb 9, 2009 at 11:12 AM, Paul Davis <pa...@gmail.com>wrote:
>
>> On Mon, Feb 9, 2009 at 10:18 AM, Wout Mertens <wm...@cisco.com> wrote:
>> > On Feb 9, 2009, at 3:57 PM, Noah Slater wrote:
>> >
>> >> On Mon, Feb 09, 2009 at 09:51:18AM -0500, Adam Petty wrote:
>> >>>
>> >>> Could this thread be added to the wiki - with only minor editing for
>> >>> length
>> >>> - maybe as "a RDBMS vs couch 'Discussion' ?"  or something similar?"...
>> >>
>> >> We've learnt from the book that such comparisons tend to be harmful.
>> >>
>> >> They lead people into thinking that there is a direct meaningful
>> >> comparison.
>> >>
>> >> Fundamentally, CouchDB and RDMS solve different problems.
>> >
>> > I dunno, I think it would be interesting to compare the main benefits of
>> > each so that you know what the strong points of each are.
>> >
>> > For example, suppose you implement schema-free in an RDBMS by adding a
>> text
>> > field that contains a JSON string. You still keep some of the metadata,
>> like
>> > _rev and _id, in proper fields.
>> >
>>
>> If you stuff a structured string into an RDBMS you're Doing It Wrong
>> &trade;.
>>
>> > However, thinking about that, it means you will need to re-implement
>> > everything CouchDB does, like views and replication.
>> >
>> > To be honest, I think saying RDBMS and CouchDB are for different
>> solutions
>> > is just you guys being nice. I think that any application would benefit
>> from
>> > using the CouchDB model and only in very specific, very demanding cases
>> an
>> > RDBMS would be better. I can't think of any examples though.
>> >
>> > So here's my challenge to the mailing list, it's pretty much the same one
>> > that MrDonut posted: Give us an example of something that would be better
>> be
>> > done with an RDBMS and something that would better be done with CouchDB.
>> >
>> > I'll help you: I think it would be easier to create a wiki with CouchDB
>> than
>> > with an RDBMS. It is possible in both but CouchDB just makes it easier. I
>> > suppose we'd have to ask the http://couch.it guys to know if that's
>> true.
>> >
>> > I don't know what would be done better in an RDBMS. Performance logging
>> > perhaps? Something with really stringent schema requirements?
>> >
>> > Wout.
>> >
>>
>> Things that CouchDB is better at:
>>
>> The interweb.
>>
>> Things that an RDBMS is better at:
>>
>> Huge amounts of business logic. As in the Oracle install running your
>> favorite hospital. Think along the lines of 10's and 100's of
>> thousands of lines of app logic in the DB itself.
>>
>> HTH,
>> Paul Davis
>>
>

Re: [user] Re: The Blog

Posted by Adam Petty <ad...@gmail.com>.
If a hospital starts to move to an SOA - and all the business logic gets
abstracted to the web services that each department exposes in the use,
transmission (security) of data... wouldn't that then become almost the
perfect place for couch?

Now if it's already in Oracle - I can see how it might not be smart to
retool just for couch, but starting from scratch, I can't think of anything
that goes on in a hospital that doesn't revolve around physical documents.

I've done consulting for medical records processing companies - but not for
hospitals themselves - any specifics as to why this wouldn't meet
requirements?




On Mon, Feb 9, 2009 at 11:12 AM, Paul Davis <pa...@gmail.com>wrote:

> On Mon, Feb 9, 2009 at 10:18 AM, Wout Mertens <wm...@cisco.com> wrote:
> > On Feb 9, 2009, at 3:57 PM, Noah Slater wrote:
> >
> >> On Mon, Feb 09, 2009 at 09:51:18AM -0500, Adam Petty wrote:
> >>>
> >>> Could this thread be added to the wiki - with only minor editing for
> >>> length
> >>> - maybe as "a RDBMS vs couch 'Discussion' ?"  or something similar?"...
> >>
> >> We've learnt from the book that such comparisons tend to be harmful.
> >>
> >> They lead people into thinking that there is a direct meaningful
> >> comparison.
> >>
> >> Fundamentally, CouchDB and RDMS solve different problems.
> >
> > I dunno, I think it would be interesting to compare the main benefits of
> > each so that you know what the strong points of each are.
> >
> > For example, suppose you implement schema-free in an RDBMS by adding a
> text
> > field that contains a JSON string. You still keep some of the metadata,
> like
> > _rev and _id, in proper fields.
> >
>
> If you stuff a structured string into an RDBMS you're Doing It Wrong
> &trade;.
>
> > However, thinking about that, it means you will need to re-implement
> > everything CouchDB does, like views and replication.
> >
> > To be honest, I think saying RDBMS and CouchDB are for different
> solutions
> > is just you guys being nice. I think that any application would benefit
> from
> > using the CouchDB model and only in very specific, very demanding cases
> an
> > RDBMS would be better. I can't think of any examples though.
> >
> > So here's my challenge to the mailing list, it's pretty much the same one
> > that MrDonut posted: Give us an example of something that would be better
> be
> > done with an RDBMS and something that would better be done with CouchDB.
> >
> > I'll help you: I think it would be easier to create a wiki with CouchDB
> than
> > with an RDBMS. It is possible in both but CouchDB just makes it easier. I
> > suppose we'd have to ask the http://couch.it guys to know if that's
> true.
> >
> > I don't know what would be done better in an RDBMS. Performance logging
> > perhaps? Something with really stringent schema requirements?
> >
> > Wout.
> >
>
> Things that CouchDB is better at:
>
> The interweb.
>
> Things that an RDBMS is better at:
>
> Huge amounts of business logic. As in the Oracle install running your
> favorite hospital. Think along the lines of 10's and 100's of
> thousands of lines of app logic in the DB itself.
>
> HTH,
> Paul Davis
>

Re: [user] Re: The Blog

Posted by Paul Davis <pa...@gmail.com>.
On Mon, Feb 9, 2009 at 10:18 AM, Wout Mertens <wm...@cisco.com> wrote:
> On Feb 9, 2009, at 3:57 PM, Noah Slater wrote:
>
>> On Mon, Feb 09, 2009 at 09:51:18AM -0500, Adam Petty wrote:
>>>
>>> Could this thread be added to the wiki - with only minor editing for
>>> length
>>> - maybe as "a RDBMS vs couch 'Discussion' ?"  or something similar?"...
>>
>> We've learnt from the book that such comparisons tend to be harmful.
>>
>> They lead people into thinking that there is a direct meaningful
>> comparison.
>>
>> Fundamentally, CouchDB and RDMS solve different problems.
>
> I dunno, I think it would be interesting to compare the main benefits of
> each so that you know what the strong points of each are.
>
> For example, suppose you implement schema-free in an RDBMS by adding a text
> field that contains a JSON string. You still keep some of the metadata, like
> _rev and _id, in proper fields.
>

If you stuff a structured string into an RDBMS you're Doing It Wrong &trade;.

> However, thinking about that, it means you will need to re-implement
> everything CouchDB does, like views and replication.
>
> To be honest, I think saying RDBMS and CouchDB are for different solutions
> is just you guys being nice. I think that any application would benefit from
> using the CouchDB model and only in very specific, very demanding cases an
> RDBMS would be better. I can't think of any examples though.
>
> So here's my challenge to the mailing list, it's pretty much the same one
> that MrDonut posted: Give us an example of something that would be better be
> done with an RDBMS and something that would better be done with CouchDB.
>
> I'll help you: I think it would be easier to create a wiki with CouchDB than
> with an RDBMS. It is possible in both but CouchDB just makes it easier. I
> suppose we'd have to ask the http://couch.it guys to know if that's true.
>
> I don't know what would be done better in an RDBMS. Performance logging
> perhaps? Something with really stringent schema requirements?
>
> Wout.
>

Things that CouchDB is better at:

The interweb.

Things that an RDBMS is better at:

Huge amounts of business logic. As in the Oracle install running your
favorite hospital. Think along the lines of 10's and 100's of
thousands of lines of app logic in the DB itself.

HTH,
Paul Davis

Re: [user] Re: The Blog

Posted by Alan Bell <al...@theopenlearningcentre.com>.
banking should work just fine. I wrote an application in Notes called 
puffin (personal finances in Notes) and I was thinking about reviving it 
as puffic or something.Each transaction is a document, each account is a 
view. A document has a source account, a destination account and a 
value. It appears as a positive value in the destination account and a 
negative in the source account. Thus buying a newspaper might debit the 
"wallet" account and credit the "books and periodicals" expense account. 
Taking cash from an ATM would debit the current acccount and credit the 
wallet.

Adam Petty wrote:
> Well -- Its sounds like couch is starting to be able to stand up to the
> fire... which is why I'm digging this thread.
>
> But yes - too much heat and the whole proprietary/RDBMS community could
> start aiming bazooka's at it - which might do some damage.  So maybe some
> middle ground somewhere..
>
> I'll work on a compilation - and post it and see where the wiki takes it.  I
> would have to agree that there is something to google's jedi strategy with
> microsoft...
>
> "nothing to see here... these aren't the droids you're looking for... of
> course we're not competing with microsoft"
>
> -- and can keep that in mind also
>
> okay enough about that -
>
> Just as a frame of reference... the only thing that has held couch back for
> development at my work - has been the lack of a pluggable reporting tool.  I
> know that is really just semantics - that Pentaho can use an XML dataset -
> and JSON - XML translation seems easy BUT.... nothing out of the gate yet.
> In my case - bosses love names -- SSRS, 10g, CrystalReports, Business
> Objects..etc.
>
> -- as for an example db issue...
> For some reason -  without transactions the RDBMS people at my work seem to
> not want to consider couch for anything having to do with money.
>
> I know it would be fairly simple to have an "accounts" array field on a JSON
> user-account document - that way no single "enities" account could be
> changed by more than one write at the same time... seems rediculously simple
> - but is there a case where this could fail?
>
> Seems like money is always the most sensitive issue - if we could develop a
> very usable "banking" example db secenario - maybe an artificial bank app?
> and see if we can break it - or get out of sync balances due to timing
> issues -- etc?
>
> .02$
>
>
>
>
>
>
> On Mon, Feb 9, 2009 at 10:27 AM, Noah Slater <ns...@apache.org> wrote:
>
>   
>> On Mon, Feb 09, 2009 at 04:18:09PM +0100, Wout Mertens wrote:
>>     
>>> To be honest, I think saying RDBMS and CouchDB are for different
>>> solutions is just you guys being nice. I think that any application
>>> would benefit from using the CouchDB model and only in very specific,
>>> very demanding cases an RDBMS would be better. I can't think of any
>>> examples though.
>>>       
>> Not really, I just like avoiding the flames. Heh heh.
>>
>> I see where you want to go with this, and I agree that some applications
>> are
>> better suited to CouchDB, but I think it's often a blurry line, and you
>> will
>> draw fire from the RDBMS people for anything too concrete.
>>
>> --
>> Noah Slater, http://tumbolia.org/nslater
>>
>>     
>
>   


Re: [user] Re: The Blog

Posted by Justin Sheehy <ju...@iago.org>.
Adam,

On Mon, Feb 9, 2009 at 10:55 AM, Adam Petty <ad...@gmail.com> wrote:

> For some reason -  without transactions the RDBMS people at my work seem to
> not want to consider couch for anything having to do with money.

You should direct your colleagues to Pat Helland, a luminary in
RDBMS's and transactions:

http://blogs.msdn.com/pathelland/archive/2007/06/14/accountants-don-t-use-erasers.aspx

-Justin

Re: [user] Re: The Blog

Posted by Jan Lehnardt <ja...@apache.org>.
On 9 Feb 2009, at 16:55, Adam Petty wrote:

> -- as for an example db issue...
> For some reason -  without transactions the RDBMS people at my work  
> seem to
> not want to consider couch for anything having to do with money.

Somebody should tell them that banks have since moved
to eventually consistent data storage for most but the most
heavyweight (money-)transactions.

Cheers
Jan
--

>
>
> On Mon, Feb 9, 2009 at 10:27 AM, Noah Slater <ns...@apache.org>  
> wrote:
>
>> On Mon, Feb 09, 2009 at 04:18:09PM +0100, Wout Mertens wrote:
>>> To be honest, I think saying RDBMS and CouchDB are for different
>>> solutions is just you guys being nice. I think that any application
>>> would benefit from using the CouchDB model and only in very  
>>> specific,
>>> very demanding cases an RDBMS would be better. I can't think of any
>>> examples though.
>>
>> Not really, I just like avoiding the flames. Heh heh.
>>
>> I see where you want to go with this, and I agree that some  
>> applications
>> are
>> better suited to CouchDB, but I think it's often a blurry line, and  
>> you
>> will
>> draw fire from the RDBMS people for anything too concrete.
>>
>> --
>> Noah Slater, http://tumbolia.org/nslater
>>


Re: [user] Re: The Blog

Posted by Adam Petty <ad...@gmail.com>.
Well -- Its sounds like couch is starting to be able to stand up to the
fire... which is why I'm digging this thread.

But yes - too much heat and the whole proprietary/RDBMS community could
start aiming bazooka's at it - which might do some damage.  So maybe some
middle ground somewhere..

I'll work on a compilation - and post it and see where the wiki takes it.  I
would have to agree that there is something to google's jedi strategy with
microsoft...

"nothing to see here... these aren't the droids you're looking for... of
course we're not competing with microsoft"

-- and can keep that in mind also

okay enough about that -

Just as a frame of reference... the only thing that has held couch back for
development at my work - has been the lack of a pluggable reporting tool.  I
know that is really just semantics - that Pentaho can use an XML dataset -
and JSON - XML translation seems easy BUT.... nothing out of the gate yet.
In my case - bosses love names -- SSRS, 10g, CrystalReports, Business
Objects..etc.

-- as for an example db issue...
For some reason -  without transactions the RDBMS people at my work seem to
not want to consider couch for anything having to do with money.

I know it would be fairly simple to have an "accounts" array field on a JSON
user-account document - that way no single "enities" account could be
changed by more than one write at the same time... seems rediculously simple
- but is there a case where this could fail?

Seems like money is always the most sensitive issue - if we could develop a
very usable "banking" example db secenario - maybe an artificial bank app?
and see if we can break it - or get out of sync balances due to timing
issues -- etc?

.02$






On Mon, Feb 9, 2009 at 10:27 AM, Noah Slater <ns...@apache.org> wrote:

> On Mon, Feb 09, 2009 at 04:18:09PM +0100, Wout Mertens wrote:
> > To be honest, I think saying RDBMS and CouchDB are for different
> > solutions is just you guys being nice. I think that any application
> > would benefit from using the CouchDB model and only in very specific,
> > very demanding cases an RDBMS would be better. I can't think of any
> > examples though.
>
> Not really, I just like avoiding the flames. Heh heh.
>
> I see where you want to go with this, and I agree that some applications
> are
> better suited to CouchDB, but I think it's often a blurry line, and you
> will
> draw fire from the RDBMS people for anything too concrete.
>
> --
> Noah Slater, http://tumbolia.org/nslater
>

Re: [user] Re: The Blog

Posted by Noah Slater <ns...@apache.org>.
On Mon, Feb 09, 2009 at 04:18:09PM +0100, Wout Mertens wrote:
> To be honest, I think saying RDBMS and CouchDB are for different
> solutions is just you guys being nice. I think that any application
> would benefit from using the CouchDB model and only in very specific,
> very demanding cases an RDBMS would be better. I can't think of any
> examples though.

Not really, I just like avoiding the flames. Heh heh.

I see where you want to go with this, and I agree that some applications are
better suited to CouchDB, but I think it's often a blurry line, and you will
draw fire from the RDBMS people for anything too concrete.

-- 
Noah Slater, http://tumbolia.org/nslater

Re: [user] Re: The Blog

Posted by Jan Lehnardt <ja...@apache.org>.
On 9 Feb 2009, at 16:56, Alan Bell wrote:

> selling airline tickets was always the classical problem you  
> couldn't do with Notes because you might overbook because of the  
> distributed system means no atomic updates and stock level checking.  
> That actually just shows how much older Notes is than the modern  
> airline where overbooking is standard policy. Anyhow an application  
> where there are multiple purchasers and a finite stock and the stock  
> levels must never ever be overcommitted probably gives the RDBMS an  
> advantage.

A single node CouchDB or a double-write (note, not
2-phase-commit) pair can handle this pretty well. It just
has limitations that true p2p setups don't have.

Cheers
Jan
--


> Wout Mertens wrote:
>> On Feb 9, 2009, at 3:57 PM, Noah Slater wrote:
>>
>>> On Mon, Feb 09, 2009 at 09:51:18AM -0500, Adam Petty wrote:
>>>> Could this thread be added to the wiki - with only minor editing  
>>>> for length
>>>> - maybe as "a RDBMS vs couch 'Discussion' ?"  or something  
>>>> similar?"...
>>>
>>> We've learnt from the book that such comparisons tend to be harmful.
>>>
>>> They lead people into thinking that there is a direct meaningful  
>>> comparison.
>>>
>>> Fundamentally, CouchDB and RDMS solve different problems.
>>
>> I dunno, I think it would be interesting to compare the main  
>> benefits of each so that you know what the strong points of each are.
>>
>> For example, suppose you implement schema-free in an RDBMS by  
>> adding a text field that contains a JSON string. You still keep  
>> some of the metadata, like _rev and _id, in proper fields.
>>
>> However, thinking about that, it means you will need to re- 
>> implement everything CouchDB does, like views and replication.
>>
>> To be honest, I think saying RDBMS and CouchDB are for different  
>> solutions is just you guys being nice. I think that any application  
>> would benefit from using the CouchDB model and only in very  
>> specific, very demanding cases an RDBMS would be better. I can't  
>> think of any examples though.
>>
>> So here's my challenge to the mailing list, it's pretty much the  
>> same one that MrDonut posted: Give us an example of something that  
>> would be better be done with an RDBMS and something that would  
>> better be done with CouchDB.
>>
>> I'll help you: I think it would be easier to create a wiki with  
>> CouchDB than with an RDBMS. It is possible in both but CouchDB just  
>> makes it easier. I suppose we'd have to ask the http://couch.it  
>> guys to know if that's true.
>>
>> I don't know what would be done better in an RDBMS. Performance  
>> logging perhaps? Something with really stringent schema requirements?
>>
>> Wout.
>
>


Re: [user] Re: The Blog

Posted by Alan Bell <al...@theopenlearningcentre.com>.
selling airline tickets was always the classical problem you couldn't do 
with Notes because you might overbook because of the distributed system 
means no atomic updates and stock level checking. That actually just 
shows how much older Notes is than the modern airline where overbooking 
is standard policy. Anyhow an application where there are multiple 
purchasers and a finite stock and the stock levels must never ever be 
overcommitted probably gives the RDBMS an advantage.

Wout Mertens wrote:
> On Feb 9, 2009, at 3:57 PM, Noah Slater wrote:
>
>> On Mon, Feb 09, 2009 at 09:51:18AM -0500, Adam Petty wrote:
>>> Could this thread be added to the wiki - with only minor editing for 
>>> length
>>> - maybe as "a RDBMS vs couch 'Discussion' ?"  or something similar?"...
>>
>> We've learnt from the book that such comparisons tend to be harmful.
>>
>> They lead people into thinking that there is a direct meaningful 
>> comparison.
>>
>> Fundamentally, CouchDB and RDMS solve different problems.
>
> I dunno, I think it would be interesting to compare the main benefits 
> of each so that you know what the strong points of each are.
>
> For example, suppose you implement schema-free in an RDBMS by adding a 
> text field that contains a JSON string. You still keep some of the 
> metadata, like _rev and _id, in proper fields.
>
> However, thinking about that, it means you will need to re-implement 
> everything CouchDB does, like views and replication.
>
> To be honest, I think saying RDBMS and CouchDB are for different 
> solutions is just you guys being nice. I think that any application 
> would benefit from using the CouchDB model and only in very specific, 
> very demanding cases an RDBMS would be better. I can't think of any 
> examples though.
>
> So here's my challenge to the mailing list, it's pretty much the same 
> one that MrDonut posted: Give us an example of something that would be 
> better be done with an RDBMS and something that would better be done 
> with CouchDB.
>
> I'll help you: I think it would be easier to create a wiki with 
> CouchDB than with an RDBMS. It is possible in both but CouchDB just 
> makes it easier. I suppose we'd have to ask the http://couch.it guys 
> to know if that's true.
>
> I don't know what would be done better in an RDBMS. Performance 
> logging perhaps? Something with really stringent schema requirements?
>
> Wout.


Re: [user] Re: The Blog

Posted by Wout Mertens <wm...@cisco.com>.
On Feb 9, 2009, at 3:57 PM, Noah Slater wrote:

> On Mon, Feb 09, 2009 at 09:51:18AM -0500, Adam Petty wrote:
>> Could this thread be added to the wiki - with only minor editing  
>> for length
>> - maybe as "a RDBMS vs couch 'Discussion' ?"  or something  
>> similar?"...
>
> We've learnt from the book that such comparisons tend to be harmful.
>
> They lead people into thinking that there is a direct meaningful  
> comparison.
>
> Fundamentally, CouchDB and RDMS solve different problems.

I dunno, I think it would be interesting to compare the main benefits  
of each so that you know what the strong points of each are.

For example, suppose you implement schema-free in an RDBMS by adding a  
text field that contains a JSON string. You still keep some of the  
metadata, like _rev and _id, in proper fields.

However, thinking about that, it means you will need to re-implement  
everything CouchDB does, like views and replication.

To be honest, I think saying RDBMS and CouchDB are for different  
solutions is just you guys being nice. I think that any application  
would benefit from using the CouchDB model and only in very specific,  
very demanding cases an RDBMS would be better. I can't think of any  
examples though.

So here's my challenge to the mailing list, it's pretty much the same  
one that MrDonut posted: Give us an example of something that would be  
better be done with an RDBMS and something that would better be done  
with CouchDB.

I'll help you: I think it would be easier to create a wiki with  
CouchDB than with an RDBMS. It is possible in both but CouchDB just  
makes it easier. I suppose we'd have to ask the http://couch.it guys  
to know if that's true.

I don't know what would be done better in an RDBMS. Performance  
logging perhaps? Something with really stringent schema requirements?

Wout.

Re: The Blog

Posted by Noah Slater <ns...@apache.org>.
On Mon, Feb 09, 2009 at 09:51:18AM -0500, Adam Petty wrote:
> Could this thread be added to the wiki - with only minor editing for length
> - maybe as "a RDBMS vs couch 'Discussion' ?"  or something similar?"...

We've learnt from the book that such comparisons tend to be harmful.

They lead people into thinking that there is a direct meaningful comparison.

Fundamentally, CouchDB and RDMS solve different problems.

-- 
Noah Slater, http://tumbolia.org/nslater

Re: The Blog

Posted by Jan Lehnardt <ja...@apache.org>.
On 9 Feb 2009, at 15:51, Adam Petty wrote:

> Mr. Donut, Jan,Damien, others in the thread,
>
> Just want to say thank you.

Heh, thanks.

> I'm being completely serious here.  I have to agree that in the  
> beginning I
> sensed hostility from Mr. Donut - but he was hitting on many  
> questions I
> knew there were answers for and after a few minutes could  
> conceptualise -
> but never had at the tip of my tongue to give to the other RDBMS  
> guys in my
> office when they rightfully grill me about using CouchDB for the  
> next gen.
> production environment.
>
> Could this thread be added to the wiki - with only minor editing for  
> length
> - maybe as "a RDBMS vs couch 'Discussion' ?"  or something  
> similar?"...

Feel free to collect a summary put it into the wiki :)


Cheers
Jan
--



> Mr. Donut - please continue.....
>
> And Damien - I might be wrong but I believe the consistency Donut was
> referring to the was maybe changing the type of data in a field on  
> the back
> end midway through the app lifecycle - or maybe through a bug in the
> application.  Or conversly adding a field - which also would then be  
> handled
> through the business layer.  I would guess this would include  
> refining a
> view to include it if it needed to be searched on - and then  
> correcting the
> business layer to accomodate for either the new view - or the new  
> key in the
> view.
> I was thinking this was a benefit but it did take a some  
> redefinition of the
> scope of a database for me to understand from the getgo. Also, it  
> seems to
> simplify the schema into truely being part of the business logic  
> layer of an
> applicaiton.  If the data needs an application to get to the  
> database -
> finally we can enforce that data integrity in only one place - but,  
> Donut if
> this isn't accurate, please reframe.
>
> (and I see Jan has answered in a similar and yet much more eloquent  
> way
> while I was typing this...)
>
>
> -- Adam
>
> On Mon, Feb 9, 2009 at 9:27 AM, Mister Donut <la...@gmail.com>  
> wrote:
>
>>> I'm suspecting here
>>> that you assume that views are created on demand, based on user- 
>>> input.
>>
>> No, I understand.
>>
>>> Totally generic object behaviour abstractions
>>> in SQL need something like 8 tables, there's no way this flies :)
>>
>> No, I was talking about the "Stuffing" implementation. All it does is
>> adding a schema-free field to an existing database? I just don't see
>> what it has anything to do with CouchDB?
>>
>>> How? (Assuming you have a use-case in mind, can you explain that?)
>>
>> Again, about the "Stuffing". It doesn't handle the lack of immediate
>> consistency. This is just what I seem to observe here. Everyone
>> praises the schema-free and JSON, but noone keeps the *eventual*
>> consistency in mind?
>>
>>> Again, can you wrap that into a concrete example, I don't quite  
>>> get what
>>> that mini-RDBMS is and how your understanding of replication ties
>>> into that :)
>>
>> You have to deal with the *eventual* consistency in your applications
>> don't you? And isn't that incredibly hard and expensive? I mean just
>> think about the end user, when he might put something in CouchDB, but
>> not immediatly see it, in fact, it might be gone for a very long  
>> time.
>> What interactive application can work with that?
>>
>>> I have another contract about to start for a server app where all  
>>> the
>>> data is maintained on the client's desktop, previewed with full
>>> functionality, and then replicated to an EC2 instance. This can be
>>> done with traditional databases, but it's trivial with CouchDB,
>>
>> Well, this is trivial with all databases? Just import and export.  
>> It's
>> just copying a file. Now imagine two users working on the data. Yes,
>> you have replication built in, so no data gets lost. But you still
>> need to figure out all the merging? Hum.
>>


Re: The Blog

Posted by Adam Petty <ad...@gmail.com>.
Mr. Donut, Jan,Damien, others in the thread,

Just want to say thank you.

I'm being completely serious here.  I have to agree that in the beginning I
sensed hostility from Mr. Donut - but he was hitting on many questions I
knew there were answers for and after a few minutes could conceptualise -
but never had at the tip of my tongue to give to the other RDBMS guys in my
office when they rightfully grill me about using CouchDB for the next gen.
production environment.

Could this thread be added to the wiki - with only minor editing for length
- maybe as "a RDBMS vs couch 'Discussion' ?"  or something similar?"...

Mr. Donut - please continue.....

And Damien - I might be wrong but I believe the consistency Donut was
referring to the was maybe changing the type of data in a field on the back
end midway through the app lifecycle - or maybe through a bug in the
application.  Or conversly adding a field - which also would then be handled
through the business layer.  I would guess this would include refining a
view to include it if it needed to be searched on - and then correcting the
business layer to accomodate for either the new view - or the new key in the
view.
I was thinking this was a benefit but it did take a some redefinition of the
scope of a database for me to understand from the getgo. Also, it seems to
simplify the schema into truely being part of the business logic layer of an
applicaiton.  If the data needs an application to get to the database -
finally we can enforce that data integrity in only one place - but, Donut if
this isn't accurate, please reframe.

 (and I see Jan has answered in a similar and yet much more eloquent way
while I was typing this...)

-- Adam

On Mon, Feb 9, 2009 at 9:27 AM, Mister Donut <la...@gmail.com> wrote:

> > I'm suspecting here
> > that you assume that views are created on demand, based on user-input.
>
> No, I understand.
>
> > Totally generic object behaviour abstractions
> > in SQL need something like 8 tables, there's no way this flies :)
>
> No, I was talking about the "Stuffing" implementation. All it does is
> adding a schema-free field to an existing database? I just don't see
> what it has anything to do with CouchDB?
>
> > How? (Assuming you have a use-case in mind, can you explain that?)
>
> Again, about the "Stuffing". It doesn't handle the lack of immediate
> consistency. This is just what I seem to observe here. Everyone
> praises the schema-free and JSON, but noone keeps the *eventual*
> consistency in mind?
>
> > Again, can you wrap that into a concrete example, I don't quite get what
> > that mini-RDBMS is and how your understanding of replication ties
> > into that :)
>
> You have to deal with the *eventual* consistency in your applications
> don't you? And isn't that incredibly hard and expensive? I mean just
> think about the end user, when he might put something in CouchDB, but
> not immediatly see it, in fact, it might be gone for a very long time.
> What interactive application can work with that?
>
> > I have another contract about to start for a server app where all the
> > data is maintained on the client's desktop, previewed with full
> > functionality, and then replicated to an EC2 instance. This can be
> > done with traditional databases, but it's trivial with CouchDB,
>
> Well, this is trivial with all databases? Just import and export. It's
> just copying a file. Now imagine two users working on the data. Yes,
> you have replication built in, so no data gets lost. But you still
> need to figure out all the merging? Hum.
>

Re: The Blog

Posted by Damien Katz <da...@apache.org>.
On Feb 9, 2009, at 9:43 AM, Mister Donut wrote:

>> Actually that's not true. We aren't using a system like SimpleDB
>> where your changes might not be immediately available in subsequent
>> queries by the same user.
>
> Isn't the point of replication to handle massive reads?

No, it's primarily for offline access and distributed edits. But it  
can be used for spreading read (and update) load too.

> If so, you'd
> be using load balancing? So you cannot predict which server will be
> chosen?

Seriously? Only if you were using a randomized load balancer, but why  
are you making that assumption?

-Damien

Re: The Blog

Posted by Jan Lehnardt <ja...@apache.org>.
On 9 Feb 2009, at 15:43, Mister Donut wrote:

>> Actually that's not true. We aren't using a system like SimpleDB
>> where your changes might not be immediately available in subsequent
>> queries by the same user.
>
> Isn't the point of replication to handle massive reads? If so, you'd
> be using load balancing? So you cannot predict which server will be
> chosen?

Read-scaling is _a single_ possible application. And this would
work just the same way e.g. MySQL read scaling would work and
there are several ways other smart people handled that case for
their use-case; plenty of sources to steal from. All this is very
application specific again and cannot be solved in a general
case.

Also, see Damien.

Cheers
Jan
--



Re: The Blog

Posted by Paul Davis <pa...@gmail.com>.
You're doing a great job distilling the greater discussion directly
into a good overview on using CouchDB. Keep up the good work.

On Mon, Feb 9, 2009 at 11:50 PM, Mister Donut <la...@gmail.com> wrote:
> Wouks, too many replies!
>
> I learned a lot by just reading. I will just reply to a few comments.
>
>> To be honest, I think saying RDBMS and CouchDB are for different
>> solutions is just you guys being nice. I think that any application
>> would benefit from using the CouchDB model and only in very specific,
>> very demanding cases an RDBMS would be better. I can't think of any
>> examples though.
>
> Yes, see, this is what I started to believe as well. But this thread
> showed me that this idea is wrong. RDBMS are there and were written
> for a reason. CouchDB solves a different problem. It's not just a new
> storage layer that can be plugged into any existing (RDBMS) database
> abstraction layer and then "it just works".
>
> http://couchdb.apache.org/docs/intro.html
>

My position is that most people using an RDBMS don't actually need
one. My general divining rod is, "If you're using a 'web framework'
and an 'Object Relational Mapper', then chances are you're Doing It
Wrong &trade;." But your general outline is correct.

> "An object-oriented database. Or more specifically, meant to function
> as a seamless persistence layer for an OO programming language."
>

I just wanted to point out that this quote is pulled from the very
prominent "What CouchDB is Not" section. The context here made that a
bit hard to follow.

>> I'll help you: I think it would be easier to create a wiki with
>> CouchDB than with an RDBMS. It is possible in both but CouchDB just
>> makes it easier. I suppose we'd have to ask the http://couch.it guys
>> to know if that's true.
>
> Well. How does CouchDB make it easier? I think I'd be easier on some
> parts, and harder on other parts. As I said, I don't think (anymore)
> that CouchDB is supposed to replace a RDBMS, but instead solve a
> different problem.
>
> As soon as you need to scale horizontally, replication comes into
> play, think Wikipedia. Because of the eventual consistency, you might
> have many different versions of pages "live". Just think what happens
> when users start to edit and save old versions. This is a very
> interesting read
>
> http://www.facebook.com/note.php?note_id=23844338919

That article is definitely a good read for anyone thinking about
issues in replication. But make sure you understand the differences
between CouchDB style replication and MySQL style replication. MySQL
is a (AFAIK, only read about it etc) log replay style replication
system. Interruptions in the log replay are very disruptive. OTOH,
CouchDB style replication is incremental and doesn't require a
constant un-interruptible connection.

>
> About cache invalidation. I just don't think that, as soon as you are
> forced to use replication, which is the whole point of CouchDB?
> Clouds? Scale horizontally? you can actually build a typical web
> application (wiki, forum, blog) that doesn't give the user a
> consistent experience.
>

If I understand correctly, you're saying that replication doesn't
allow building a webapp that provides a consistent experience.
Assuming that's what you meant, I don't think that's quite right. Just
like the Facebook article, there are strategies you can use for sticky
sessions etc to make sure that users are reading from the server that
accepted their write. Any time you end up with noticeable propagation
delays you'll run into interesting problem spaces like this. And by no
means is replication 'the whole point' of CouchDB.

> Now, if you build something like Antony Blakey (#9 in this thread),
> that seems like a really great idea on how to use CouchDB.
>

Lots of desktop CouchDB installs is definitely an exciting mode of operation.

>> I know it would be fairly simple to have an "accounts" array field on a JSON
>> user-account document - that way no single "enities" account could be
>> changed by more than one write at the same time... seems rediculously simple
>> - but is there a case where this could fail?
>
> Well, isn't the standard example:
>
> Person A only has $500.
>
> 1 Check A's account: $500
> 2 Set A's account: $0
> 3 Check B's account: $1000
> 4 Set B's account: $1500
>
> Now, if at any time between 1 and 2 or 3 and 4 you modify A's or B's
> account, you have lost.
>
> This is where it could fail? Assuming the four actions are not sent in
> the same request.
>
> Again. This is RDBMS thinking. In CouchDB, the balance should probably
> be a view. But there is still no way to enforce that you have enough
> money on your account before you can withdraw. Is there?
>
> Check Balance
> <- Other Instance Withdraws
> Withdraw
>
>> Things that CouchDB is better at:
>> The interweb.
>> Things that an RDBMS is better at:
>> Huge amounts of business logic. As in the Oracle install running your
>> favorite hospital. Think along the lines of 10's and 100's of
>> thousands of lines of app logic in the DB itself.
>
> You know, I am trying really hard, but these comments just contribute
> absolutely nothing to the discussion.
>

I apologize that my humor overshadowed the message. Distilling a bit I
would rephrase it as, "If the you require consistency above
availability and partition tolerance, CouchDB may not be the right
hammer for your nail."

More thoroughly, CouchDB is good in terms of the model of the web
itself. The World Wide Web is not consistent. Documents are not always
valid. Hyperlinks can and do break regularly. There are a huge amount
of errors in the system. And yet it chugs on merrily.

On the opposite end of the spectrum, we have extremely large RDBMS
installs on huge iron. IIRC, I read an article that the 37signals crew
just bought a 32 GiB machine to scale up Basecamp. Single machine
running a highly available and highly consistent RDBMS. Now, if you
tried splitting that single database over multiple physical nodes
(which is what the article I read was arguing against) the whole
system would require many man hours of systems engineering or a huge
rewrite of the base application logic.

Another thought that just occurred to me. Another way of describing
the difference is that in CouchDB the data is important. In an RDBMS,
it's the relations that are important (or the focus at least).

>> You can do that with Map/Reduce.
>> Create a view that gets all the comments and get them with limit=0,
>> there's your counter.
>
> No, you cannot. *Variable* criteria. A Map/Reduce is a fixed criteria.
> Also, a counter in the most abstract meaning. The only way to count
> something in CouchDB is to add every item to the database and then use
> a view. There is no +=. And there is no way to aggregate the count
> into a single key.

Yes, you can. Just not in the way you're used to thinking. A
Map/Reduce view is a fixed *mapping* from documents to a sorted
key/value space. The important part to note is the word 'sorted'. Its
precisely this sorting of your emitted key that allows you to select
specific records based on variable criteria. In CouchDB queries, you
spend time thinking in advance about how to get things to sort into an
order from which you can select a contiguous slice.

Also, you may count things. Patrick's original example gave you a
quick way to count a type of document. The reduce implementation
*defaults* to producing a single output value. If you emitted the
values [-1, 10, 3, -5] for any set of keys, and your reduce function
was simple "return sum(values)" you would get an output value of 7.

> Patrick, if you aren't trying, then don't. There are enough people who
> actually try.
>

Patrick was trying to help and was correct. Being rude after
misunderstanding his comments will not garner you much good will.

HTH,
Paul Davis

Re: The Blog

Posted by Alan Bell <al...@theopenlearningcentre.com>.
Jan Lehnardt wrote:
>  
> docs in the meantime that can be later deleted. Much like a log-file
> analyzer + logrotation. Not saying it is the best idea, I'd use something
> else for that (or would I, watch dev@ for news
this sounds interesting, I had a browse in the archives but couldn't 
find it, is there a subject line already under discussion or something 
you may bring up soon? I am a bit interested in round robin databases 
like RRDtool at the moment where logged data is incrementally aggregated 
going back in time.
> ), plus, this is about your
> data, so you'd not just count arbitrary things but you'd have a bunch of
> documents representing your records and you'd be able to sum them
> up just nice.
>
>
>


Re: The Blog

Posted by Jan Lehnardt <ja...@apache.org>.
Hi,

Mister D, you made a convincing argument that you made your
homework and you haven't and several people told you now (without
backing it up because they assumed you'd understand), please
don't get cranky.

Everybody, this thread is heading towards personal and leaving the
technical. Let's not do that, ok? My patience growth thinner too,
but lets show some code.

--

Mister D. You lead this under the assumption* that a lot of people
are not getting CouchDB and that we should work on our promotion
and documentation. You have a point, but you're also just seeing
only one side. There's a large group of people who do get CouchDB
for what it is, just because some people blog about their experiences
(the blog) which just uses a limited feature set of CouchDB doesn't
mean they don't get it. (maybe they even don't, but that's okay,
their blog doesn't need to highly scalable, they are just tired of  
writing
SQL, fighting an ORM or being lost in driver-dependency hell).

*you never said so explicitly, but your argument leave me to believe
that. If that's not the case, can you clarify.


>> Yes, you can. Just not in the way you're used to thinking. A
>> Map/Reduce view is a fixed *mapping* from documents to a sorted
>> key/value space.
>
> Yes, you just said it. *Fixed*. If you have 200 documents, 100 from
> Jan to Nov, and 100 from Nov to Dec, there is no way you can fill them
> into two buckets ("Jan-Nov" and "Nov-Dec"). It would require variable
> conditions.

docs:

{"_id":"foo","date":[2008, 01, 01],"data":"abc"}
{"_id":"bar","date":[2008, 02, 01],"data":"def"}
{"_id":"baz","date":[2008, 03, 01],"data":"abc"}
{"_id":"qux","date":[2008, 04, 01],"data":"ghi"}
{"_id":"quux","date":[2008, 05, 01],"data":"jkl"}
{"_id":"corge","date":[2008, 06, 01],"data":"mno"}
{"_id":"grault","date":[2008, 07, 01],"data":"pqr"}
{"_id":"garply","date":[2008, 08, 01],"data":"stu"}
{"_id":"waldo","date":[2008, 09, 01],"data":"vwx"}
{"_id":"fred","date":[2008, 10, 01],"data":"yza}
{"_id":"plugh","date":[2008, 11, 01],"data":"bcd"}
{"_id":"xyzzy","date":[2008, 12, 01],"data":"efg"}

map function:

function(doc) {
   emit(doc.date, doc.data);
}

result:

{"key":[2008, 01, 01],"value":"abc","id":foo}
{"key":[2008, 02, 01],"value":"def","id":bar}
{"key":[2008, 03, 01],"value":"abc","id":baz}
{"key":[2008, 04, 01],"value":"ghi","id":qux}
{"key":[2008, 05, 01],"value":"jkl","id":quux}
{"key":[2008, 06, 01],"value":"mno","id":corge}
{"key":[2008, 07, 01],"value":"pqr","id":grault}
{"key":[2008, 08, 01],"value":"stu","id":garply}
{"key":[2008, 09, 01],"value":"vwx","id":waldo}
{"key":[2008, 10, 01],"value":"yza","id":fred}
{"key":[2008, 11, 01],"value":"bcd","id":plugh}
{"key":[2008, 12, 01],"value":"efg","id":xyzzy}

Not sure what you exactly propose, but streaming
the view result to the client and cutting at a date
where the month marker bumps to 10 seems pretty
reasonable to get this into two buckets.

If you permit two requests:

?endkey=[2008,10] // everything up to october
?startkey=[2008,11] // everything since
?startkey=[2008,11]&endkey=[2008, 12] // everything since if you  
permit more docs and 2009 data.

Index operations are your "variable conditions".


>> Also, you may count things.
>
> I never said you couldn't. I said you cannot count like += and you
> cannot aggregate counts to get rid of all the documents. Let's say you
> want to count pageviews. Easy, insert a document for every pageview,
> create a "sum-view". But, this will lead to way too many documents?
> Doesn't seem feasible. Of course, CouchDB isn't the tool for that job,
> but I would still like to see some really hands on examples of what
> CouchDB can do. I think we covered the concepts now.

Run a cronjob that does the roll-up for you periodically and use single
docs in the meantime that can be later deleted. Much like a log-file
analyzer + logrotation. Not saying it is the best idea, I'd use  
something
else for that (or would I, watch dev@ for news), plus, this is about  
your
data, so you'd not just count arbitrary things but you'd have a bunch of
documents representing your records and you'd be able to sum them
up just nice.


>> Patrick was trying to help and was correct.
>
> No, he is not.

We need to get into a concrete example before we can solve this
one. I assume there's just a mismatch of assumptions. We could
also let it rest.


Cheers
Jan
--


Re: The Blog

Posted by Paul Davis <pa...@gmail.com>.
On Tue, Feb 10, 2009 at 1:34 AM, Mister Donut <la...@gmail.com> wrote:
>> On the opposite end of the spectrum, we have extremely large RDBMS
>> installs on huge iron. IIRC, I read an article that the 37signals crew
>> just bought a 32 GiB machine to scale up Basecamp.
>> the whole system would require many man hours of systems engineering
>> or a huge rewrite of the base application logic.
>
> Yeah but CouchDB doesn't magically solve that problem, does it?
> RDBMS + Memcached goes a very long way.
>

Of course CouchDB doesn't 'magically solve that problem'.

> Also, Basecamp seems to be "easy" to partition, "like" Flickr, (mind
> you, "easy"!), because most accounts are "self-contained". There is a
> project, a few users. They don't overlap. Of course, once you detach
> users from their projects, ... or allow users to comment on
> everything, that's where it gets hard? The problems start when
> everything relates to everything. (see:)
>

The blog post I read was saying exactly the opposite (at least about Basecamp):

http://www.37signals.com/svn/posts/1509-mr-moore-gets-to-punt-on-sharding

>> Another thought that just occurred to me. Another way of describing
>> the difference is that in CouchDB the data is important. In an RDBMS,
>> it's the relations that are important (or the focus at least).
>
> That is a very interesting point. I tend to agree after these emails.
> Also, most example applications of CouchDB that users have presented
> in this very thread seem to be about data, and not about relations.
> The sync to S3, the message queue with included aggregating
> reporting... Whereas a typical web application (wiki, blog),
> everything is about relations. Isn't that so? Users in user groups
> with permissions writing posts belonging to categories, having
> comments by other users. I don't really see how you can just throw
> that out of the window?! I mean, exactly why would you use the
> key/value pairs user_x, permission_x, group_x, entry_x, comment_x
> instead of just five tables. I don't understand where the CouchDB
> implementation shines for just exactly that thing. Mind you, this is
> RDBMS thinking (again), and I totally see the reason to use CouchDB
> for those projects outlined here (S3, Queue).
>

My line from the other day is that CouchDB is relational as the web is
relational. It can have references, but those references can break
just like a hyperlink can. Most people don't spend a huge amount of
time worrying about it. Which begs the question of why some hyperlinks
(intrasite) are orders of magnitude more important than others
(intersite).

Bottom line: Relax.

> I think there is a bit of a problem of the approach here. Most
> "convinced" CouchDB users seem to want to tell you "Well RDBMS suck,
> CouchDB is much better", "Why?", "Well, because... [Concepts,
> Key/Value, Map/Reduce]". Instead of trying to show you a few hands-on
> approaches of solving a problem *that has not been solved* by RDBMS
> yet.
>

Erm. No. On all counts. If you follow the community for a bit longer
you'll see that most people are generally enthused about the feature
set that CouchDB provides. In the general sense most people have a
fairly intuitive feeling for what would be better suited in CouchDB
vs. an RDBMS. There is some gray area and overlap which causes a fair
amount of discussion on particulars (as evidenced by this thread) but
I wouldn't say that requires anyone to solve 'a problem *that has not
been solved* by [an] RDBMS'. Such a demand is preposterous.

> I think the message reporting queue with aggregating, an example like
> that, instead of just "The Blog in 5 Minutes" you see everywhere,
> would go a long way into showing what CouchDB is all about.
>

The "Blog in 5 minutes" while somewhat tired, is instantly understood.
Consider it the "Hello, World!" of Web 2.0 if you must.

> It can show how useful Map/Reduce can be (to create aggregate
> reports), and how you can possibly have two message queues that can
> stay in sync.
>
>> Yes, you can. Just not in the way you're used to thinking. A
>> Map/Reduce view is a fixed *mapping* from documents to a sorted
>> key/value space.
>
> Yes, you just said it. *Fixed*. If you have 200 documents, 100 from
> Jan to Nov, and 100 from Nov to Dec, there is no way you can fill them
> into two buckets ("Jan-Nov" and "Nov-Dec"). It would require variable
> conditions.
>

No, I said 'fixed *mapping*'. And I could write you an example in few
lines of code. But I'm not going to because you've worn my patience
thin.

>> Also, you may count things.
>
> I never said you couldn't. I said you cannot count like += and you
> cannot aggregate counts to get rid of all the documents. Let's say you
> want to count pageviews. Easy, insert a document for every pageview,
> create a "sum-view". But, this will lead to way too many documents?
> Doesn't seem feasible. Of course, CouchDB isn't the tool for that job,
> but I would still like to see some really hands on examples of what
> CouchDB can do.

You'll need to spend more time learning CouchDB before you assume to
know it's limitations and the trade-offs of the approaches that would
be most appropriate for a specific use case.

> I think we covered the concepts now.
>

I don't think I've covered the concepts after months of working with CouchDB.

>> Patrick was trying to help and was correct.
>
> No, he is not.
>

You forgot to quote the part about how being rude and wrong does not
make a good impression. Twice in a row is exceedingly grating.

HTH,
Paul Davis

Re: The Blog

Posted by Patrick Antivackis <pa...@gmail.com>.
I don't think most "convinced" CouchDB users want to say "Well RDBMS suck,
CouchDB is much better", the world is not just monolithic and the one size
fits all concept is not very efficient.

Lots of projects have arise to create "document databases", see the JCR (JSR
170) represented in the Apache world by JackRabbit,  see CouchDB, see Typo3
(php CMS) that is also working on a JCR like storage in the PHP world. The
aim of these solutions is not to be THE response to ALL the problems, it's
just to provide robust solution for document management, and web CMS.

I use JCR since 3 years, I use CouchDB since 8 months, and I'm happy to use
them, and they are really efficient for the projects I used them for (CMS,
Digital Asset Management, queue system, lite data mining).

Now I will not use them when I need to do an ERP project, i will then to use
something like Ofbiz (Apache) with it's schema of 800 tables.

Each problem has it's solution and some can have multiple solutions.
Choosing the right solution is rarely white or black, it's a tradeoff. RDBMS
are great, Document databases are great too.

About referential constraint, it's a trade off also, look even one the most
used database on the internet, MySQL with its MyIsam engine, where are the
constraints ?

Nobody ask you to chose CouchDB if you don't want, now if you want to open
your mind and accept than RDBMS are not the right solutions for all the
problems, maybe CouchDb can help you.





2009/2/10 Mister Donut <la...@gmail.com>

> > On the opposite end of the spectrum, we have extremely large RDBMS
> > installs on huge iron. IIRC, I read an article that the 37signals crew
> > just bought a 32 GiB machine to scale up Basecamp.
> > the whole system would require many man hours of systems engineering
> > or a huge rewrite of the base application logic.
>
> Yeah but CouchDB doesn't magically solve that problem, does it?
> RDBMS + Memcached goes a very long way.
>
> Also, Basecamp seems to be "easy" to partition, "like" Flickr, (mind
> you, "easy"!), because most accounts are "self-contained". There is a
> project, a few users. They don't overlap. Of course, once you detach
> users from their projects, ... or allow users to comment on
> everything, that's where it gets hard? The problems start when
> everything relates to everything. (see:)
>
> > Another thought that just occurred to me. Another way of describing
> > the difference is that in CouchDB the data is important. In an RDBMS,
> > it's the relations that are important (or the focus at least).
>
> That is a very interesting point. I tend to agree after these emails.
> Also, most example applications of CouchDB that users have presented
> in this very thread seem to be about data, and not about relations.
> The sync to S3, the message queue with included aggregating
> reporting... Whereas a typical web application (wiki, blog),
> everything is about relations. Isn't that so? Users in user groups
> with permissions writing posts belonging to categories, having
> comments by other users. I don't really see how you can just throw
> that out of the window?! I mean, exactly why would you use the
> key/value pairs user_x, permission_x, group_x, entry_x, comment_x
> instead of just five tables. I don't understand where the CouchDB
> implementation shines for just exactly that thing. Mind you, this is
> RDBMS thinking (again), and I totally see the reason to use CouchDB
> for those projects outlined here (S3, Queue).
>
> I think there is a bit of a problem of the approach here. Most
> "convinced" CouchDB users seem to want to tell you "Well RDBMS suck,
> CouchDB is much better", "Why?", "Well, because... [Concepts,
> Key/Value, Map/Reduce]". Instead of trying to show you a few hands-on
> approaches of solving a problem *that has not been solved* by RDBMS
> yet.
>
> I think the message reporting queue with aggregating, an example like
> that, instead of just "The Blog in 5 Minutes" you see everywhere,
> would go a long way into showing what CouchDB is all about.
>
> It can show how useful Map/Reduce can be (to create aggregate
> reports), and how you can possibly have two message queues that can
> stay in sync.
>
> > Yes, you can. Just not in the way you're used to thinking. A
> > Map/Reduce view is a fixed *mapping* from documents to a sorted
> > key/value space.
>
> Yes, you just said it. *Fixed*. If you have 200 documents, 100 from
> Jan to Nov, and 100 from Nov to Dec, there is no way you can fill them
> into two buckets ("Jan-Nov" and "Nov-Dec"). It would require variable
> conditions.
>
> > Also, you may count things.
>
> I never said you couldn't. I said you cannot count like += and you
> cannot aggregate counts to get rid of all the documents. Let's say you
> want to count pageviews. Easy, insert a document for every pageview,
> create a "sum-view". But, this will lead to way too many documents?
> Doesn't seem feasible. Of course, CouchDB isn't the tool for that job,
> but I would still like to see some really hands on examples of what
> CouchDB can do. I think we covered the concepts now.
>
> > Patrick was trying to help and was correct.
>
> No, he is not.
>

Re: The Blog

Posted by Jan Lehnardt <ja...@apache.org>.
All, please do not respond to this mail.

Cheers
Jan
--

On 10 Feb 2009, at 11:39, Mister Donut wrote:

> Anyway, I won't post anymore. Apparently it's like Ruby on Rails,
> superiority and if you don't like it, then leave. So, good luck. (Omg
> we don't want you anyway, lololol!)
>
> I found this poking around the interwebs, following some blogs, found
> a few entries, and here I was. The wiki left me confusing because a
> lot of it is not yet done (Yes I know you are not done yet).
>
> Two users shared two projects, which I am really thankful for, they
> showed me how you could make use of CouchDBs features (funnily enough,
> neither of them was a web application).
>
>> which just uses a limited feature set of CouchDB doesn't
>> mean they don't get it.
>
> Well I wasn't here to find out what you can just as well with anything
> else that allows you to store and retrieve.
>
>> Not sure what you exactly propose, but streaming
>> the view result to the client and cutting at a date
>> where the month marker bumps to 10 seems pretty
>> reasonable to get this into two buckets.
>
> A view that gives you the Jan-Nov documents in the first group, and
> Nov-Dec in the second one.
>
>> Run a cronjob that does the roll-up for you periodically and use  
>> single
>> docs in the meantime that can be later deleted.
>
> How does it integrate with replication?
>
> Anyway. Haha...
>


Re: The Blog

Posted by Mister Donut <la...@gmail.com>.
Anyway, I won't post anymore. Apparently it's like Ruby on Rails,
superiority and if you don't like it, then leave. So, good luck. (Omg
we don't want you anyway, lololol!)

I found this poking around the interwebs, following some blogs, found
a few entries, and here I was. The wiki left me confusing because a
lot of it is not yet done (Yes I know you are not done yet).

Two users shared two projects, which I am really thankful for, they
showed me how you could make use of CouchDBs features (funnily enough,
neither of them was a web application).

> which just uses a limited feature set of CouchDB doesn't
> mean they don't get it.

Well I wasn't here to find out what you can just as well with anything
else that allows you to store and retrieve.

> Not sure what you exactly propose, but streaming
> the view result to the client and cutting at a date
> where the month marker bumps to 10 seems pretty
> reasonable to get this into two buckets.

A view that gives you the Jan-Nov documents in the first group, and
Nov-Dec in the second one.

> Run a cronjob that does the roll-up for you periodically and use single
> docs in the meantime that can be later deleted.

How does it integrate with replication?

Anyway. Haha...

Re: The Blog

Posted by Chris Anderson <jc...@apache.org>.
On Mon, Feb 9, 2009 at 10:34 PM, Mister Donut <la...@gmail.com> wrote:
> Let's say you
> want to count pageviews. Easy, insert a document for every pageview,
> create a "sum-view". But, this will lead to way too many documents?
> Doesn't seem feasible.

I've got an old review of CouchDB as a logging platform here:

http://jchrisa.net/drl/_show/sofa/post/wide_finder_in_couchdb

The key is incremental view builds. If you have the disk to support
views of your logs, CouchDB can definitely outperform any of the other
Wide Finder solutions, when the number of queries and size of data
goes up.

You don't have to create a document per log-line... it's probably best
to buffer lines in a process, and save them to CouchDB in batches of
roughly 1,000 docs at a time, depending on how many lines you put in a
single doc.

Also, I think Patrick's answers were pretty much right.

-- 
Chris Anderson
http://jchris.mfdz.com

Re: The Blog

Posted by Mister Donut <la...@gmail.com>.
> On the opposite end of the spectrum, we have extremely large RDBMS
> installs on huge iron. IIRC, I read an article that the 37signals crew
> just bought a 32 GiB machine to scale up Basecamp.
> the whole system would require many man hours of systems engineering
> or a huge rewrite of the base application logic.

Yeah but CouchDB doesn't magically solve that problem, does it?
RDBMS + Memcached goes a very long way.

Also, Basecamp seems to be "easy" to partition, "like" Flickr, (mind
you, "easy"!), because most accounts are "self-contained". There is a
project, a few users. They don't overlap. Of course, once you detach
users from their projects, ... or allow users to comment on
everything, that's where it gets hard? The problems start when
everything relates to everything. (see:)

> Another thought that just occurred to me. Another way of describing
> the difference is that in CouchDB the data is important. In an RDBMS,
> it's the relations that are important (or the focus at least).

That is a very interesting point. I tend to agree after these emails.
Also, most example applications of CouchDB that users have presented
in this very thread seem to be about data, and not about relations.
The sync to S3, the message queue with included aggregating
reporting... Whereas a typical web application (wiki, blog),
everything is about relations. Isn't that so? Users in user groups
with permissions writing posts belonging to categories, having
comments by other users. I don't really see how you can just throw
that out of the window?! I mean, exactly why would you use the
key/value pairs user_x, permission_x, group_x, entry_x, comment_x
instead of just five tables. I don't understand where the CouchDB
implementation shines for just exactly that thing. Mind you, this is
RDBMS thinking (again), and I totally see the reason to use CouchDB
for those projects outlined here (S3, Queue).

I think there is a bit of a problem of the approach here. Most
"convinced" CouchDB users seem to want to tell you "Well RDBMS suck,
CouchDB is much better", "Why?", "Well, because... [Concepts,
Key/Value, Map/Reduce]". Instead of trying to show you a few hands-on
approaches of solving a problem *that has not been solved* by RDBMS
yet.

I think the message reporting queue with aggregating, an example like
that, instead of just "The Blog in 5 Minutes" you see everywhere,
would go a long way into showing what CouchDB is all about.

It can show how useful Map/Reduce can be (to create aggregate
reports), and how you can possibly have two message queues that can
stay in sync.

> Yes, you can. Just not in the way you're used to thinking. A
> Map/Reduce view is a fixed *mapping* from documents to a sorted
> key/value space.

Yes, you just said it. *Fixed*. If you have 200 documents, 100 from
Jan to Nov, and 100 from Nov to Dec, there is no way you can fill them
into two buckets ("Jan-Nov" and "Nov-Dec"). It would require variable
conditions.

> Also, you may count things.

I never said you couldn't. I said you cannot count like += and you
cannot aggregate counts to get rid of all the documents. Let's say you
want to count pageviews. Easy, insert a document for every pageview,
create a "sum-view". But, this will lead to way too many documents?
Doesn't seem feasible. Of course, CouchDB isn't the tool for that job,
but I would still like to see some really hands on examples of what
CouchDB can do. I think we covered the concepts now.

> Patrick was trying to help and was correct.

No, he is not.

Re: The Blog

Posted by Mister Donut <la...@gmail.com>.
Wouks, too many replies!

I learned a lot by just reading. I will just reply to a few comments.

> To be honest, I think saying RDBMS and CouchDB are for different
> solutions is just you guys being nice. I think that any application
> would benefit from using the CouchDB model and only in very specific,
> very demanding cases an RDBMS would be better. I can't think of any
> examples though.

Yes, see, this is what I started to believe as well. But this thread
showed me that this idea is wrong. RDBMS are there and were written
for a reason. CouchDB solves a different problem. It's not just a new
storage layer that can be plugged into any existing (RDBMS) database
abstraction layer and then "it just works".

http://couchdb.apache.org/docs/intro.html

"An object-oriented database. Or more specifically, meant to function
as a seamless persistence layer for an OO programming language."

> I'll help you: I think it would be easier to create a wiki with
> CouchDB than with an RDBMS. It is possible in both but CouchDB just
> makes it easier. I suppose we'd have to ask the http://couch.it guys
> to know if that's true.

Well. How does CouchDB make it easier? I think I'd be easier on some
parts, and harder on other parts. As I said, I don't think (anymore)
that CouchDB is supposed to replace a RDBMS, but instead solve a
different problem.

As soon as you need to scale horizontally, replication comes into
play, think Wikipedia. Because of the eventual consistency, you might
have many different versions of pages "live". Just think what happens
when users start to edit and save old versions. This is a very
interesting read

http://www.facebook.com/note.php?note_id=23844338919

About cache invalidation. I just don't think that, as soon as you are
forced to use replication, which is the whole point of CouchDB?
Clouds? Scale horizontally? you can actually build a typical web
application (wiki, forum, blog) that doesn't give the user a
consistent experience.

Now, if you build something like Antony Blakey (#9 in this thread),
that seems like a really great idea on how to use CouchDB.

> I know it would be fairly simple to have an "accounts" array field on a JSON
> user-account document - that way no single "enities" account could be
> changed by more than one write at the same time... seems rediculously simple
> - but is there a case where this could fail?

Well, isn't the standard example:

Person A only has $500.

1 Check A's account: $500
2 Set A's account: $0
3 Check B's account: $1000
4 Set B's account: $1500

Now, if at any time between 1 and 2 or 3 and 4 you modify A's or B's
account, you have lost.

This is where it could fail? Assuming the four actions are not sent in
the same request.

Again. This is RDBMS thinking. In CouchDB, the balance should probably
be a view. But there is still no way to enforce that you have enough
money on your account before you can withdraw. Is there?

Check Balance
<- Other Instance Withdraws
Withdraw

> Things that CouchDB is better at:
> The interweb.
> Things that an RDBMS is better at:
> Huge amounts of business logic. As in the Oracle install running your
> favorite hospital. Think along the lines of 10's and 100's of
> thousands of lines of app logic in the DB itself.

You know, I am trying really hard, but these comments just contribute
absolutely nothing to the discussion.

> You can do that with Map/Reduce.
> Create a view that gets all the comments and get them with limit=0,
> there's your counter.

No, you cannot. *Variable* criteria. A Map/Reduce is a fixed criteria.
Also, a counter in the most abstract meaning. The only way to count
something in CouchDB is to add every item to the database and then use
a view. There is no +=. And there is no way to aggregate the count
into a single key.
Patrick, if you aren't trying, then don't. There are enough people who
actually try.

Re: The Blog

Posted by Mister Donut <la...@gmail.com>.
> Actually that's not true. We aren't using a system like SimpleDB
> where your changes might not be immediately available in subsequent
> queries by the same user.

Isn't the point of replication to handle massive reads? If so, you'd
be using load balancing? So you cannot predict which server will be
chosen?

Re: The Blog

Posted by Jan Lehnardt <ja...@apache.org>.
On 9 Feb 2009, at 15:27, Mister Donut wrote:
> No, I was talking about the "Stuffing" implementation. All it does is
> adding a schema-free field to an existing database? I just don't see
> what it has anything to do with CouchDB?

Stuffing adds a hash to an AR-model that gets stored into CouchDB
when the rest of the model is stored in whatever RDBMS you use.


>> How? (Assuming you have a use-case in mind, can you explain that?)
>
> Again, about the "Stuffing". It doesn't handle the lack of immediate
> consistency. This is just what I seem to observe here. Everyone
> praises the schema-free and JSON, but noone keeps the *eventual*
> consistency in mind?

The use-case here is the single node. A single node is consistent,
just like an RDBMS. You don't have to use CouchDB in the multi-node
configuration. You can just use REST & JSON of CouchDB without
ever needing to thing about eventual consistency.


>> Again, can you wrap that into a concrete example, I don't quite get  
>> what
>> that mini-RDBMS is and how your understanding of replication ties
>> into that :)
>
> You have to deal with the *eventual* consistency in your applications
> don't you? And isn't that incredibly hard and expensive? I mean just
> think about the end user, when he might put something in CouchDB, but
> not immediatly see it, in fact, it might be gone for a very long time.
> What interactive application can work with that?

Yes, you need to make a trade-off on "immediately". On the local node
the users sees his changes immediately, but potentially not on a remote
node due to network partition. If your application is remote only and  
you
have a multi-node cluster where the master the user is writing to dies
before his changes got replicated, he might see "inconsistent" data, but
in that case, the setting could be cached on the application- or user- 
end
or the GUI could inform the user that his setting, despite immediate
success message, couldn't store the setting permanently. This is an
edge case though. You can also make single cluster nodes highly  
available
and write to two or three servers with any write and only report success
if all report success. This has other implications though. Again,  
trade-offs.


>> I have another contract about to start for a server app where all the
>> data is maintained on the client's desktop, previewed with full
>> functionality, and then replicated to an EC2 instance. This can be
>> done with traditional databases, but it's trivial with CouchDB,
>
> Well, this is trivial with all databases? Just import and export. It's
> just copying a file.

Plus locking the DB in the meantime. CouchDB replication is live.


> Now imagine two users working on the data. Yes,
> you have replication built in, so no data gets lost. But you still
> need to figure out all the merging? Hum.

Merging is application dependent and can be rarely solved generically.
We've been talking about leaving merging to a user-specifiable function
and potentially shipping with a set of functions for common cases. E.g.
if "older" entries always lose and all docs have a timestamp field,
automatic resolution of conflicts are possible. Or the user-specified
merge function could determine that only different fields in a doc were
changed an auto-merge the two revisions. But now work has been
made on that end. As I said in the last mail, we're still building  
this thing.
Plus: Both things can be done in application-land as well.

(Aside, patches welcome :)

Cheers
Jan
--


Re: The Blog

Posted by Damien Katz <da...@apache.org>.
On Feb 9, 2009, at 9:27 AM, Mister Donut wrote:

>> I'm suspecting here
>> that you assume that views are created on demand, based on user- 
>> input.
>
> No, I understand.
>
>> Totally generic object behaviour abstractions
>> in SQL need something like 8 tables, there's no way this flies :)
>
> No, I was talking about the "Stuffing" implementation. All it does is
> adding a schema-free field to an existing database? I just don't see
> what it has anything to do with CouchDB?
>
>> How? (Assuming you have a use-case in mind, can you explain that?)
>
> Again, about the "Stuffing". It doesn't handle the lack of immediate
> consistency. This is just what I seem to observe here. Everyone
> praises the schema-free and JSON, but noone keeps the *eventual*
> consistency in mind?
>
>> Again, can you wrap that into a concrete example, I don't quite get  
>> what
>> that mini-RDBMS is and how your understanding of replication ties
>> into that :)
>
> You have to deal with the *eventual* consistency in your applications
> don't you? And isn't that incredibly hard and expensive? I mean just
> think about the end user, when he might put something in CouchDB, but
> not immediatly see it, in fact, it might be gone for a very long time.
> What interactive application can work with that?

Actually that's not true. We aren't using a system like SimpleDB   
where your changes might not be immediately available in subsequent  
queries by the same user. Eventually consistency in CouchDB refers to  
remote replication, where multiple changes that otherwise should be  
grouped together won't necessary replicate together, and certainly not  
in one transaction. But eventually they all get there.

-Damien

>
>
>> I have another contract about to start for a server app where all the
>> data is maintained on the client's desktop, previewed with full
>> functionality, and then replicated to an EC2 instance. This can be
>> done with traditional databases, but it's trivial with CouchDB,
>
> Well, this is trivial with all databases? Just import and export. It's
> just copying a file. Now imagine two users working on the data. Yes,
> you have replication built in, so no data gets lost. But you still
> need to figure out all the merging? Hum.


Re: The Blog

Posted by Antony Blakey <an...@gmail.com>.
On 10/02/2009, at 12:57 AM, Mister Donut wrote:

>> I have another contract about to start for a server app where all the
>> data is maintained on the client's desktop, previewed with full
>> functionality, and then replicated to an EC2 instance. This can be
>> done with traditional databases, but it's trivial with CouchDB,
>
> Well, this is trivial with all databases? Just import and export. It's
> just copying a file. Now imagine two users working on the data. Yes,
> you have replication built in, so no data gets lost.

But in Couch it's both hot and incremental, and requires no  
configuration/scripting etc. No copying of files. Just a single http  
request.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

If at first you don’t succeed, try, try again. Then quit. No use being  
a damn fool about it
   -- W.C. Fields


Re: The Blog

Posted by Mister Donut <la...@gmail.com>.
> I'm suspecting here
> that you assume that views are created on demand, based on user-input.

No, I understand.

> Totally generic object behaviour abstractions
> in SQL need something like 8 tables, there's no way this flies :)

No, I was talking about the "Stuffing" implementation. All it does is
adding a schema-free field to an existing database? I just don't see
what it has anything to do with CouchDB?

> How? (Assuming you have a use-case in mind, can you explain that?)

Again, about the "Stuffing". It doesn't handle the lack of immediate
consistency. This is just what I seem to observe here. Everyone
praises the schema-free and JSON, but noone keeps the *eventual*
consistency in mind?

> Again, can you wrap that into a concrete example, I don't quite get what
> that mini-RDBMS is and how your understanding of replication ties
> into that :)

You have to deal with the *eventual* consistency in your applications
don't you? And isn't that incredibly hard and expensive? I mean just
think about the end user, when he might put something in CouchDB, but
not immediatly see it, in fact, it might be gone for a very long time.
What interactive application can work with that?

> I have another contract about to start for a server app where all the
> data is maintained on the client's desktop, previewed with full
> functionality, and then replicated to an EC2 instance. This can be
> done with traditional databases, but it's trivial with CouchDB,

Well, this is trivial with all databases? Just import and export. It's
just copying a file. Now imagine two users working on the data. Yes,
you have replication built in, so no data gets lost. But you still
need to figure out all the merging? Hum.

Re: The Blog

Posted by Brian Candler <B....@pobox.com>.
> > CouchDB won't allow you to "jump to page X", but if you look at
> > e.g. Google, it doesn't work either. [...]
> > But surrogate keys are considered harmful and I'd say (but that
> > really depends on the application), not very helpful.
> 
> I guess I was assuming that CouchDB, due to its different nature, has
> a sophisticated solution for this. But apparently pagination is a
> problem that is really hard to solve.

It seems to me that CouchDB is at least no worse than an RDBMS here.

Any RDBMS I know builds its indexes using B-trees. So if you do

  SELECT ... FROM db ORDER BY k OFFSET 5000 LIMIT 10

then you're forcing the SQL database to traverse its B-tree index for 5000
entries, then retrieve the next 10, then use those to find the rows you're
interested in.

If my understanding is right, then exactly the same is true of couchdb if
you use the skip/limit options on a view.

Both can use relative paging (e.g. SELECT ... WHERE k >= startkey) if you're
only interested in "next page", "previous page". That's what I'd use for
very large datasets. You can easily do links for the next 10 pages (say), by
selecting more than you need to display in the first page.

However, couchdb offers you a number of options which a SQL database
doesn't. For instance:

1. When you generate your view, you can emit the entire document (or any
selection of fields of special interest) as the value. This means that your
index query which returns 10 keys can return the 10 documents as well; a SQL
database may have to do 10 additional head seeks to return the rows.

There is a tradeoff in index disk space used, of course, but the choice is
up to you.

2. Updating. If you do 500 INSERTs followed by one SELECT in a SQL database,
unless you use some admin-level tricks like temporarily disabling indexing,
all affected indexes will be updated for every INSERT.

With couchdb, you'll get a single update of all indexes when the SELECT
takes place. This may add some latency, but it's far less work than updating
the indexes 500 times.

3. The reduce data structure is extremely smart. If there are N documents
stored in one B-tree node, then the pre-computed reduce value for those N
documents is stored in that node too.

So if you ask for an aggregate value from K1..Kn, and this spans some whole
blocks of B-tree nodes, only the end ones need to be re-reduced:

  ________K1___   _______________  ________________  _____K4_________
          <--->   <------------->  <-------------->  <----->
        reduce       already           already       reduce
          R1'        reduced R2        reduced R3      R4'

and couchdb just calculates reduce(R1',R2,R3,R4') to get the final answer.

In principle, an RDBMS could use the same kind of logic for

    select count(*) from db where k BETWEEN 'k1' AND 'k4'

but I don't know if any of them do. I highly doubt that they do it for
arbitrary aggregation functions like

    select sum(n) from db where k BETWEEN 'k1' AND 'k4'

Couchdb makes this trivial and highly efficient, because you explicitly ask
for which summary values you want to be handled in this way.

The downside, of course, is that you have to *plan* this in couchdb, by
building your views appropriately. A SQL database can take any arbitrary
query, and have a stab at it using whatever combination of index scans and
table reads it thinks is appropriate. But having seen SQL databases make
very bad decisions in this area, I don't consider this something to trumpet
about.

The other downside is when doing joins across multiple 'tables', which in
couchdb would be one document cross-referencing to another. You have to
build your view with multiple rows, one from each document, and combine them
in the client. This isn't particularly hard, but it does negate the reduce
functionality.

> You still need one lookup for every blog entry on a page.
> And there is no way you can ever store the comment count inside the blog
> entry.

I'm not sure what an RDBMS offers here that couchdb does not.

A simple map/reduce query will give you a count of all the comments for a
blog entry (and it will scale to millions of comments).

Sure, you can construct a SQL join which gives you the blog entry plus its
comments count in one go, but the SQL database is doing the same sort of
work behind the scenes.

If you have multiple blog entries on a page, a single couchdb group query
can give you all the comment counts in one go. If the keyspace is contiguous
(e.g. by blog posting date) then it's easy (*). And even if not, you can use
the POST multiple-fetch API to get all the comments counts for an arbitrary
set of blog entries in one request.

But perhaps I'm missing something from your requirements.

Regards,

Brian.

(*) If each comment document has a blog_entry_id, then you can emit
something like

    keys                                 values

    ["2009/02/01/entry1","comment1"]     null
    ["2009/02/01/entry1","comment2"]     null
    ["2009/02/09/entry2","comment3"]     null
    ["2009/02/09/entry2","comment4"]     null
    ["2009/02/09/entry2","comment5"]     null

Use a counter map-reduce function:

    function(ks, vs, co) {
      if (co) {
        return sum(vs);
      } else {
        return vs.length;
      }
    }

For the comment counts for all blog entries this month, ask for

    group=true&group_level=1&startkey=["2009/02/01"]&endkey=["2009/03/01"]

Getting the text of all these blog entries would be a separate query.

I think this shows that in couchdb, there is an advantage to using a doc id
which has relevance to the application.

The SQL normalization brigade would say use a random uuid for every blog
entry and every comment. If you do, I agree that makes it a bit harder to do
this sort of aggregation. But I think the multi-fetch API should still work.

Re: The Blog

Posted by Mister Donut <la...@gmail.com>.
I'll jump right in.

> CouchDB won't allow you to "jump to page X", but if you look at
> e.g. Google, it doesn't work either. [...]
> But surrogate keys are considered harmful and I'd say (but that
> really depends on the application), not very helpful.

I guess I was assuming that CouchDB, due to its different nature, has
a sophisticated solution for this. But apparently pagination is a
problem that is really hard to solve.

> Can you elaborate on that? I don't quote get the "or duplicate data,
> basically anything that needs to be the same as something else" bit.

Well. Let's say you have a list of documents. You want to store some
information about the newest document in a separate key (instead of a
view, which might be slow? if you have too many). That isn't possible.
Or let's say you have documents, and categories. And many, many, many
of them. Again, the view to show the latest document might be too
slow, so you want to save that information in a separate key. Not
possible.

> A couple of things you can do with CouchDB replication (again, not  saying,
> that you can't do some of those with an RDBMS but it is getting harder
> the further you move down the list): [...]

Thank you for that list. I think, and like many other users,
considering what I have read in blogs, seem to expect something else
from CouchDB. I am not so sure where this is coming from.

Check the Ruby thing a few mails down. How exactly is that
implementation going to work without immediate consistency? Everyone
seems to be going on about it being schema free, but you can just add
a "param" field to any database and transparently (un)serialize and
there you have it, schema-free. If you actually have a few nodes (with
that implementation), it will break big, big time. I think, possibly,
with the "Cloud Hype", that I got into believing, that it will "just
work". With anything that you throw at it. Like what Amazon SimpleDB
tells you it would.

Yes, Key/Value pairs are incredibly easy. MapReduce is amazing and
intriguing. But handling the replication, won't it be so difficult
that you end up with a Quasi-Mini-RDBMS anyway?

Now I got far away from my original questions, but I guess that
happens often in discussions.

Basically, now: "Is it possible to handle the replication in such a
way that you don't end up with a Mini-RDBMS anyway in the end?"

I would just, really really really, like to see an example that goes
beyond schema-free. That handles replication. I think that would show
where CouchDB shines, and where you'd fail with a RDBMS.

And, CouchDB isn't magical glue to make pagination work, either.

Re: The Blog

Posted by Mister Donut <la...@gmail.com>.
> It be easier to follow your thoughts, if you would take the time
> to formulate them in complete sentences. Your offensive tone
> doesn't help either.

I don't know where I am being offensive, but I don't like when people
just assume that I stupid or haven't done my research. Because, yes, I
have searched for pagination, and yes, I have read the Wiki, and yes,
I realize that CouchDB isn't a RDBMS, and no, I am not trying to put
my RDBMS ideas into CouchDB. And this isn't mean offensive, but rather
to show you that I really tried, to my best knowledge. I know what
MapReduce is. I know what Key/Value pairs are. I know how CouchDB puts
them together (the idea, I am not an Erland hacker).

> limit is documented to be slow and should not be used to
> implement paging. That's what startkey and endkey are
> for and the mailing list archives show you how that is done.
> Which is not the only solution to pagination.

Startkey and Endkey allow you to implement "Back" and "Forward". I
wrote that. They do not allow you to jump directly somewhere. For
that, you'd somehow need to store start- and endkeys, but the nature
of CouchDB doesn't allow you to. Unless I am missing a point. Amazon
SimpleDB, and Google WebApps (sp?) suffer from the same problem, if
you read their forums. Basically they tell you: "Well, it doesn't
work".

> See CAP*, RDBMSs pick consistency and availability where CouchDB
> picks availability and partitionability.
> * Consistency, Availability, Partitioning, pick two.

I know, as I said, I read the book. Not trying to be offensive.
What application benefits from picking Partitionability and
Availability? And can work with a lack of Consistency?
You cannot store any kind of counters, or duplicate data, basically
anything that needs to be the same as something else.

> If you try to find 1:1 mappings of solutions from the RDBMS world
> to CouchDB, you must conclude that CouchDB sucks. But that
> doesn't mean that CouchDB sucks.

What I am trying to find, is an example, that, by using CouchDB,
offers an unique solution.
All examples of CouchDB I have seen so far, are either so simple, they
don't show what CouchDB is all about (probably the author doesn't get
it, not trying to be offensive, but posts like "10 Reasons Why CouchDB
Is Better Than (Random RDBMS)"... well.) or they try to solve a
problem that has already been solved by RDBMS.

Availability (well, yeah, I don't know what to say).
Partitionability:
The Replication feature is really interesting. It is intriguing. It
makes me want to make something with. But I just cannot find anything
that could "exploit" this very feature.

Re: The Blog

Posted by Mister Donut <la...@gmail.com>.
Hello Patrick,

I am writing down thoughts as they come.
I would appreciate if you tried to follow.
Yes, I realize storing comments inside the entry is a bad idea.
I was following an introduction to CouchDB.

> Sure, use the view to get all comments and specify count=0.
> You can get them in one query, just group comments by post_id in your map reduce query.

I don't think you understand my point.
Yes, I know. Maybe you should re-read.
You still need one lookup for every blog entry on a page.
And there is no way you can ever store the comment count inside the blog entry.

> startkey, endkey and limit.

That sounds so great. But wait. LIMIT.
I know that from SQL. It doesn't scale.
Jumping to page 1234567 of ten million. Please, no.

And you cannot, ever, group items based on a variable criteria.
For example in batches of around one hundred.
Which solves pagination.
A view cannot provide that. But again, there is no way you can "manually" do it.

> Anything!

I challenge you. Build me a counter!
No seriously.

Pick one:
GROUP BY, LIMIT, _rev to fake transactions. Uh. Oh. Hello SQL?
You know something magical that allows you to avoid _rev.

So please tell me.
What exactly can I use CouchDB for that uses its strengths,
and not weaknesses? I am honestly not trying to make a fool of anyone.
The CouchDB book seems only to justify the design choices.
The Wiki is completely unhelpful.
Every time it gets interesting.
http://wiki.apache.org/couchdb/How_to_implement_tagging
It stops. That second article, that would certainly enlighten me.
Yes, how, would you go about implementing that?

I don't see how you can make it work but by using an awful lot of merging logic.
Isn't that why you use a RDBMS in the first place?

> It's already there.

You're right, I missed that one. It's even more scary.

Re: The Blog

Posted by Patrick Aljord <pa...@gmail.com>.
> MapReduce! We don't need live counter updates.
> But we still need to store the counter separately.
> Twenty blog entries on one page, twenty queries.

You can get them in one query, just group comments by post_id in your
map reduce query.

Re: The Blog

Posted by Noah Slater <ns...@apache.org>.
On Mon, Feb 09, 2009 at 02:57:27PM +0100, Jan Lehnardt wrote:
> On 9 Feb 2009, at 14:47, Noah Slater wrote:
>
>> On Mon, Feb 09, 2009 at 01:52:39PM +0900, Mister Donut wrote:
>>> Unless you use a lot of silly _rev.
>>> Which is exactly why RDBMS were built?
>>
>> Eh, what? Your email is scoring 7/10 on my troll'o'meter.
>
> I'm not sure if that helps with the discussion.

Sorry Jan! Guess I was feeling too trigger happy. ;)

-- 
Noah Slater, http://tumbolia.org/nslater

Re: The Blog

Posted by Jan Lehnardt <ja...@apache.org>.
Hi Noah,

On 9 Feb 2009, at 14:47, Noah Slater wrote:

> On Mon, Feb 09, 2009 at 01:52:39PM +0900, Mister Donut wrote:
>> Unless you use a lot of silly _rev.
>> Which is exactly why RDBMS were built?
>
> Eh, what? Your email is scoring 7/10 on my troll'o'meter.

I'm not sure if that helps with the discussion.

Cheers
Jan
--


Re: The Blog

Posted by Noah Slater <ns...@apache.org>.
On Mon, Feb 09, 2009 at 01:52:39PM +0900, Mister Donut wrote:
> Unless you use a lot of silly _rev.
> Which is exactly why RDBMS were built?

Eh, what? Your email is scoring 7/10 on my troll'o'meter.

-- 
Noah Slater, http://tumbolia.org/nslater