You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Nick Poulden <ni...@domine.co.uk> on 2010/04/21 03:08:40 UTC

Intersecting several views

I'm trying to create a bulk email app that can take a database of user
records and filter them by any number of attributes, much like the 'Advanced
Segments' feature of Google Analytics. For example, you could specify that
the email was to be sent to all 'Gold' users who registered more than 3
months ago who haven't logged in in the last 2 weeks.

My question is how to approach this filtering problem with couchdb. At the
moment I have different views on 'account_type', 'registration_date' and
'last_login_date'. My program queries each view separately and returns an
intersection of the resulting key arrays. This is working ok while my
database is small but it's not very scalable as all the document keys have
to be loaded in to memory before the intersection can happen.

Does anyone have any ideas for a better way to approach this problem?

Thanks,

Nick

Re: Intersecting several views

Posted by Dean Landolt <de...@deanlandolt.com>.
On Tue, Apr 20, 2010 at 9:42 PM, Ning Tan <ni...@gmail.com> wrote:

> On Tue, Apr 20, 2010 at 9:08 PM, Nick Poulden <ni...@domine.co.uk> wrote:
> > My question is how to approach this filtering problem with couchdb. At
> the
> > moment I have different views on 'account_type', 'registration_date' and
> > 'last_login_date'. My program queries each view separately and returns an
> > intersection of the resulting key arrays. This is working ok while my
> > database is small but it's not very scalable as all the document keys
> have
> > to be loaded in to memory before the intersection can happen.
>
> For these "ad hoc" queries, I would suggest a separate indexing system
> (e.g. Couch-Lucene, etc).
>
> We use a Solr system for similar purposes.
>

Historically the solution to this problem has always been *run all your
changes through something else*. I'm pretty down on that as an actual
solution -- it's feasible but makes *something else* a pretty bigass
bottleneck. Now you have to have a distributed *something else* and a
distributed couch -- there's obvious benefits to a distributed couch but
your *something else* is just serving up answers to specific ad-hoc
questions...

There's a pretty slick solution built into couch -- replication -- but the
hard part is getting couch's replication to talk to your *something else*.
I'm trying to fix that now, and it ought to be easy. I'll hit this list back
once I get couch replication working on persevere 2.0 but the idea is all
your couch _changes should be able to flow into some other defined store
(mongo, lucene, sql, whatever) and from your couchapp (or anywhere else) you
should be able to hit your *something else* to resolve multiterm queries.

I'll post back in a few days but just wanted to jump in and point out that
this kind of thing ought to be totally doable.

Re: Intersecting several views

Posted by Ning Tan <ni...@gmail.com>.
On Tue, Apr 20, 2010 at 9:08 PM, Nick Poulden <ni...@domine.co.uk> wrote:
> My question is how to approach this filtering problem with couchdb. At the
> moment I have different views on 'account_type', 'registration_date' and
> 'last_login_date'. My program queries each view separately and returns an
> intersection of the resulting key arrays. This is working ok while my
> database is small but it's not very scalable as all the document keys have
> to be loaded in to memory before the intersection can happen.

For these "ad hoc" queries, I would suggest a separate indexing system
(e.g. Couch-Lucene, etc).

We use a Solr system for similar purposes.