You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Dusty Leary <dl...@gmail.com> on 2008/12/16 21:18:23 UTC

view reduce and updating documents

Hi,
I'm wondering about the 'reduce' part of views and how updating a
document is handled.

I think I understand how arbitrary views work, and how they can be
stored in a B-tree, and an update is 'cheap'...
I suppose it's:
A)  Drop old documents from views when the document is updated.
B) Re-run the view function on all new documents when the view is re-read.

First, is this correct?

Second:  In the face of a reduce function, how does this work?  It
seems like in the general case, you need to rebuild the entire view,
since you can't "remove" the effects of a single document from the
reduce.

Unless maybe the reduce function is rerun on every view?
Or, partial reduce intermediates are cached, maybe in a tree?

-Dusty

Re: view reduce and updating documents

Posted by Paul Davis <pa...@gmail.com>.
This article is pretty good at explaining the update mechanism:

http://horicky.blogspot.com/2008/10/couchdb-implementation.html

On Tue, Dec 16, 2008 at 3:18 PM, Dusty Leary <dl...@gmail.com> wrote:
> Hi,
> I'm wondering about the 'reduce' part of views and how updating a
> document is handled.
>
> I think I understand how arbitrary views work, and how they can be
> stored in a B-tree, and an update is 'cheap'...
> I suppose it's:
> A)  Drop old documents from views when the document is updated.
> B) Re-run the view function on all new documents when the view is re-read.
>
> First, is this correct?
>

The right idea, but the actual implementation combines these steps.
There are some properties of the btree implementation that show up in
a fairly elegant but non-obvious way. What's technically happening is
that CouchDB iterates over the update_seq data to find documents that
have changed. If a document is deleted, then the output of the map can
be thought of as {deleted_doc_id, []} or, zero results for that docid.
Updated and created docs are actually passed to the js function for
processing.

The results are then just written to the btree. The append only nature
of btrees means that a process can write to the file while not
clobbering the old version of the btree which other people may be
reading. Readers will pick up the newer btree version the next time
the read the file header.

> Second:  In the face of a reduce function, how does this work?  It
> seems like in the general case, you need to rebuild the entire view,
> since you can't "remove" the effects of a single document from the
> reduce.
>
> Unless maybe the reduce function is rerun on every view?
> Or, partial reduce intermediates are cached, maybe in a tree?
>

Another facet of the append only btree and the nifty way that damien
added storing partial reduce results. I only have a vague idea of how
this works mostly coming from the link I posted. I'm pretty sure the
interpretation is that you basically have a 'shadow' tree that uses
the btree leaf nodes. Then when your map changes a leaf node and you
go to read the reduce tree, it detects any new leaf nodes (append
only!) and re-runs the reduce up to the root. Then again, I haven't
actually groked the code on this so I could be way off. :D

> -Dusty
>

HTH,
Paul Davis