You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Daniel Itaboraí <it...@gmail.com> on 2010/03/28 05:09:49 UTC
Lame ad-hoc querying proposal: Map/Filter/Reduce

First of all, excuse my newbyness. I've been curious about CouchDB for a
while now and I just recently started to tinker with it.

One thing that caught my attention was the querying part. I really like the
whole map/reduce idea for the construction of the views, but after reading a
little a bit about Raindrop's megaview, I was wondering if there isn't an
easier way to do ad-hoc queries (other than temporary views).

If we look at couch's views more or less like indexes, we should be able to
pass a user defined filter function whenever querying a view. That function
would be executed after the map and before the reduce phase for each
document. This function would receive as an argument the key and value
emmited by the map function, as well as the document when include_docs=true.
This function would return a boolean value indicating whether it would
filter out the map output or not. If the user specify this filter function,
then the intermediated reductions stored in the B-Tree would have to be
ignored while processing the user request(I don't know how the performance
hit would be when ignoring these reductions, but it seems to me that they
would be linear to the ammount of documents retrieved, which can be ok in a
lot of situations).

I think this is far from perfect. There are a lot of flaws such as these
"filter" function taking too long to run(or potentially entering a infinite
loop), their compilation or interpretation time taking too long, as well as
tying the client to a specific query server language(maybe the client should
also specify the query server language as a mandatory argument). Above all
else, it's just plain ugly to construct this dynamic functions and pass them
around. Despite all that, I think there should be a way to do ad-hoc
querying other than retrieving a whole bunch of documents and discarding
them on the client-side.

At the heart of this issue I think it's the fact that whenever there's a
time x space trade-off, CouchDB tends to sacrifice disk space. I believe
that given Couch's design goals, that's definitely the right choice, but
some form of ad-hoc querying must be supported, even though it's kinda
kludgy.

I really appreciate the work that you guys been doing and I firmly believe
that Couch will be an even huger success than it is today. I think the work
done on MVCC and replication really laid an awesome foundation on which many
great things can and definitely will be built.

regards,
Daniel Itaboraí