You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Norman Barker <no...@gmail.com> on 2010/08/23 20:23:48 UTC

Re: multiview

Ryan,

I have cc'd dev list, this is of general interest, comments inline.

On Mon, Aug 23, 2010 at 10:41 AM, Ryan Hill <rc...@computer.org> wrote:
> Hey there - I like your contribution, but have a couple of questions (that I
> didn't want to clutter up the dev-list discussion).
>
> -- Does your logic take into account map-only results that may have
> duplicate keys? In the simple case, this could mean preserving all the
> duplicates that satisfy the multi-view constraints (i.e. intersection), if
> that behavior seems generally logical.

duplicates are kept when testing intersection
>
> -- I find myself performing quite a few set operations on view results,
> currently entirely in memory (which I happen to have enough of) using python
> sets. I think your code could be generalized to take care of this, and that
> doing so would strengthen the case for inclusion into trunk. Possibly even
> more so than improving the iterative sections. Do you see this
> generalization being something you would also be interested in? Examples
> include union and difference, in addition to the current intersection being
> discussed. Composite operations (the difference result of one view taken
> w.r.t the intersection of two other views) should also be considered.
>

I am interested in union and difference, but think we can do this
without holding the results in memory, holding the results in memory
is fine if you have a few users, but I intentionally wrote it to be
streaming so that it scales.

I am thinking that getting intersection in the trunk is a good first
step, difference operation might be possible by inverting keys, and
union is really simple just stream a result from each view one by one.

> -- Lastly, have you thought about this in the context of other iterative
> map-reduce implementations? Another pattern that I use frequently is to feed
> the results of one M/R that accumulates totals for a group of documents into
> a second M/R that that applies a statistical test to filter out documents
> that fail to meet certain significance criteria. This is fairly easy to
> implement in something like Riak or Twister, but would be useful to have in
> couch as well.
>
I haven't thought about feeding results from one map result to
another, but agree it is interesting.

> Based on my experience with couch, I think there are a finite number of
> complex view operations that should find their way into the primary code
> path, of which yours is one. I might be able to help implement a level of
> abstraction to encapsulate all such operations, if you are similarly
> interested.
>

I have an interesting in geocouch and am thinking of the common query
language as a start for defining the view operations, but there might
be other query protocols that are better suited, is there a JSONQL for
example.

> What do you think?
>
I am interested in any help, so thanks for letting me know.

> Cheers,
> -R
>
>
>