You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Jan Lehnardt <ja...@apache.org> on 2009/05/13 16:08:31 UTC

View Filter

Hi,

I made views faster! :)

I wrote a patch* that introduces the concept of a view filter.
A new design doc option acts as a document filter and
prevents a doc from getting serialized and sent to the view
server. This is useful to avoid unnecessary computation
when using views that use the `if(doc.type == "foo") {…`
pattern.

* http://github.com/janl/couchdb/commit/a47a4831db74e3e0400c6faaaa29984e10ac861c

The filter works like this:
{
   _id:"_design/test_foo",
   language: "javascript",
   options: {
     filter: [{type: "foo"}, {type: "bar"}] // oh hey, proplists in  
JSON! :)
   },
   views: {
     all_docs: { // really, only all foo and bar docs
       map: "function(doc) { emit(doc.integer, null); }"
     }
   }
};

If *any* of the `{field, value}`** objects match a *top level* field and
value in a document, it gets sent to the view server.

** Yeah, `field` is not hardcoded to `type`, so if you use  
`class:"foo"`,
this patch works for you.

A few notes:

  - It would be nice if we could extend this so…
    No — I don't like to add any more bells an whistles to this as
    eventually this will lead to pure Erlang views which we want
    to get anyway.

  - In the light of other view server improvements, this might prove
    to gain only marginal speed.

  - Can we have a filter per view, not a filter per design doc? — Not
    without major reworking of the view server and with losing other
    optimisations.

I don't think I should just go and commit this without discussion. In
fact, I'd opt to only include this patch if there's demand. I'm happy
to maintain the patch outside of CouchDB for those who need that
speedup.

Cheers
Jan
--


Re: View Filter

Posted by Kore Nordmann <ma...@kore-nordmann.de>.
Jan Lehnardt wrote:
> Hi,
> 
[..]
> I wrote a patch* that introduces the concept of a view filter.
> A new design doc option acts as a document filter and
> prevents a doc from getting serialized and sent to the view
> server. This is useful to avoid unnecessary computation
> when using views that use the `if(doc.type == "foo") {…`
> pattern.
> 
[..]

Nice patch.

I would love to see that change in CouchDB, since some of my CouchDB 
based projects really could use that. I keep my views in different 
disign document for maintainability reasons and thus would benefit from 
it *a lot*. :)

Kind regards,
Kore

Re: View Filter

Posted by Zachary Zolton <za...@gmail.com>.
Moreover, many of my attempts to have different types of docs in one
database (for joins, etc) have ended up with my moving them into
separate databases. It's been pretty easy (most of the time) to do
that work in my Ruby code!

Re: View Filter

Posted by Zachary Zolton <za...@gmail.com>.
Drat... I actually may just came from place where knowing how to keep
my doc types in separate databases —and being able to speed up the
map-reduce churn of querying a reduce-with-group query with view
filters— would have save me a TON of work!

Urgh... At worst, I'll put it in my blog...  :^(

On Thu, May 14, 2009 at 8:25 PM, Mark Hammond <sk...@gmail.com> wrote:
> On 15/05/2009 4:47 AM, Brian Candler wrote:
>>
>> On Thu, May 14, 2009 at 09:53:14AM -0500, Zachary Zolton wrote:
>>>
>>> (1) people who are storing large documents in CouchDB but not indexing
>>> them
>>> at all (I guess this is possible, e.g. if the doc ids are well-known or
>>> stored in other documents, but this isn't the most common way of working)
>>
>> The proposal would exclude a document from *all* views in a particular
>> design doc. So you're only going to get a benefit from this if you have a
>> large number of documents (or a number of large documents) which are not
>> required to be indexed in any view in that design doc.
>
> Yep - and that is the point.  Consider Jan's example, where it was filtering
> on doc['type'].  If a database had (say) 10 potential values of 'type', then
> all filters that only care about a single type will only care about 1 in 10
> of those documents.
>
> Taking this to its extreme, we tested Jan's patch on a view which matches
> very few document in a large database.  Rebuilding that view with a filter
> was 18 times faster than without the filter.  We put this down to the fact
> the filter managed to avoid the json encode/decode step for the vast
> majority of the docs in the database.  IOW, on my test database, 6 minutes
> is spent before the filters can actually do anything (ie, that is just the
> json processing), whereas using the filter to avoid that json step brings it
> down to 20 seconds.
>
> So while not everyone will be able to see such significant speedups, many
> may find it extremely useful.
>
>> And it's reasonable, given that (as I understand it) each document is
>> already only passed once to the view server, in order to be indexed by all
>> the views in that design document.
>
> I agree there is lots that can and should be done to speed up views that do
> indeed care about most of the docs - such views spend less time relatively
> in the json encode step and more time in the interpreter.  As an experiment,
> I "ported" one of our views that does look at most of the docs from
> javascript to erlangview, and the performance increase was far more modest
> (20% maybe).  I suspect the javascript interpreter is faster than erlang, so
> I suspect that there will be a level of view complexity where using
> javascript *increases* view performance over erlang, even when factoring in
> the json processing...
>
> Cheers,
>
> Mark
>

Re: View Filter

Posted by Wojciech Kaczmarek <ka...@gmail.com>.
Just 2 cents

On Fri, May 15, 2009 at 09:38, Brian Candler <B....@pobox.com> wrote:
[cut]
> It might be possible to make the feature more general though. For example,
> suppose each view had its own filter, and the erlang server took the *union*
> of those filters to work out which documents to send. Then, when sending a
> document, it sent a list of which views to process it with. This could be
> used to simplify the view code by removing the doc.type test, whilst getting
> the performance benefit automatically.

+1.
I think this optimization is not premature, and need for it emerged
from experiences of many. It seems sane to limit the number of docs
sent to the view server and speed up the calculation of view indexes
this way.

> Example:
>
>  views:{
>    view1:{
>      filter:[{type:"foo"}],
>      map:...
>    }
>    view2:{
>      filter:[{type:"foo"},{type:"bar"}],
>      map:...
>    }
>  }
>
> When a document of type foo is sent, it would be sent to the view engine
> with a list ["view1","view2"] of the views to be invoked on it. A document
> of type bar would have ["view2"]. A document of type baz would not be sent
> at all.
>
> But maybe this is too complicated, and going further down this route ends up
> with an erlang view server anyway.

+1 ;-]

cheers,
Wojtek

Re: View Filter

Posted by Jan Lehnardt <ja...@apache.org>.
On 15 May 2009, at 09:38, Brian Candler wrote:

> On Fri, May 15, 2009 at 11:25:01AM +1000, Mark Hammond wrote:
>>> The proposal would exclude a document from *all* views in a  
>>> particular
>>> design doc. So you're only going to get a benefit from this if you  
>>> have a
>>> large number of documents (or a number of large documents) which  
>>> are not
>>> required to be indexed in any view in that design doc.
>>
>> Yep - and that is the point.  Consider Jan's example, where it was
>> filtering on doc['type'].  If a database had (say) 10 potential  
>> values
>> of 'type', then all filters that only care about a single type will  
>> only
>> care about 1 in 10 of those documents.
>
> Sure, as long as *none* of the views in that design document care  
> about a
> significant proportion of the documents.
>
> It's unusual that people will have docs which are completely  
> unindexed, so I
> think this patch mainly helps in the case where the user has 10  
> separate
> design documents, each of which is only interested in documents of  
> one type.
>
> Of course, that's a perfectly legitimate way of using CouchDB, and I  
> don't
> oppose this change at all.
>
> It might be possible to make the feature more general though. For  
> example,
> suppose each view had its own filter, and the erlang server took the  
> *union*
> of those filters to work out which documents to send. Then, when  
> sending a
> document, it sent a list of which views to process it with. This  
> could be
> used to simplify the view code by removing the doc.type test, whilst  
> getting
> the performance benefit automatically.

Like I said in the original mail. This wouldn't be possible without a  
major rewrite
of the view serverand I'd rather not do that in the light of other,  
more important
changes.

Cheers
Jan
--


>
> Example:
>
>  views:{
>    view1:{
>      filter:[{type:"foo"}],
>      map:...
>    }
>    view2:{
>      filter:[{type:"foo"},{type:"bar"}],
>      map:...
>    }
>  }
>
> When a document of type foo is sent, it would be sent to the view  
> engine
> with a list ["view1","view2"] of the views to be invoked on it. A  
> document
> of type bar would have ["view2"]. A document of type baz would not  
> be sent
> at all.
>
> But maybe this is too complicated, and going further down this route  
> ends up
> with an erlang view server anyway.
>
>> Taking this to its extreme, we tested Jan's patch on a view which
>> matches very few document in a large database.  Rebuilding that view
>> with a filter was 18 times faster than without the filter.  We put  
>> this
>> down to the fact the filter managed to avoid the json encode/decode  
>> step
>> for the vast majority of the docs in the database.
>
> You also avoided sending the docs over the socket and waiting for the
> response. So maybe latency is also part of the problem. Depends  
> whether the
> view server interface does any sort of pipelining of requests.
>
> Regards,
>
> Brian.
>


Re: View Filter

Posted by Brian Candler <B....@pobox.com>.
On Fri, May 15, 2009 at 11:25:01AM +1000, Mark Hammond wrote:
>> The proposal would exclude a document from *all* views in a particular
>> design doc. So you're only going to get a benefit from this if you have a
>> large number of documents (or a number of large documents) which are not
>> required to be indexed in any view in that design doc.
>
> Yep - and that is the point.  Consider Jan's example, where it was  
> filtering on doc['type'].  If a database had (say) 10 potential values  
> of 'type', then all filters that only care about a single type will only  
> care about 1 in 10 of those documents.

Sure, as long as *none* of the views in that design document care about a
significant proportion of the documents.

It's unusual that people will have docs which are completely unindexed, so I
think this patch mainly helps in the case where the user has 10 separate
design documents, each of which is only interested in documents of one type.

Of course, that's a perfectly legitimate way of using CouchDB, and I don't
oppose this change at all.

It might be possible to make the feature more general though. For example,
suppose each view had its own filter, and the erlang server took the *union*
of those filters to work out which documents to send. Then, when sending a
document, it sent a list of which views to process it with. This could be
used to simplify the view code by removing the doc.type test, whilst getting
the performance benefit automatically.

Example:

  views:{
    view1:{
      filter:[{type:"foo"}],
      map:...
    }
    view2:{
      filter:[{type:"foo"},{type:"bar"}],
      map:...
    }
  }

When a document of type foo is sent, it would be sent to the view engine
with a list ["view1","view2"] of the views to be invoked on it. A document
of type bar would have ["view2"]. A document of type baz would not be sent
at all.

But maybe this is too complicated, and going further down this route ends up
with an erlang view server anyway.

> Taking this to its extreme, we tested Jan's patch on a view which  
> matches very few document in a large database.  Rebuilding that view  
> with a filter was 18 times faster than without the filter.  We put this  
> down to the fact the filter managed to avoid the json encode/decode step  
> for the vast majority of the docs in the database.

You also avoided sending the docs over the socket and waiting for the
response. So maybe latency is also part of the problem. Depends whether the
view server interface does any sort of pipelining of requests.

Regards,

Brian.

Re: View Filter

Posted by Mark Hammond <sk...@gmail.com>.
On 15/05/2009 4:47 AM, Brian Candler wrote:
> On Thu, May 14, 2009 at 09:53:14AM -0500, Zachary Zolton wrote:
>> (1) people who are storing large documents in CouchDB but not indexing them
>> at all (I guess this is possible, e.g. if the doc ids are well-known or
>> stored in other documents, but this isn't the most common way of working)
>
> The proposal would exclude a document from *all* views in a particular
> design doc. So you're only going to get a benefit from this if you have a
> large number of documents (or a number of large documents) which are not
> required to be indexed in any view in that design doc.

Yep - and that is the point.  Consider Jan's example, where it was 
filtering on doc['type'].  If a database had (say) 10 potential values 
of 'type', then all filters that only care about a single type will only 
care about 1 in 10 of those documents.

Taking this to its extreme, we tested Jan's patch on a view which 
matches very few document in a large database.  Rebuilding that view 
with a filter was 18 times faster than without the filter.  We put this 
down to the fact the filter managed to avoid the json encode/decode step 
for the vast majority of the docs in the database.  IOW, on my test 
database, 6 minutes is spent before the filters can actually do anything 
(ie, that is just the json processing), whereas using the filter to 
avoid that json step brings it down to 20 seconds.

So while not everyone will be able to see such significant speedups, 
many may find it extremely useful.

> And it's reasonable, given that (as I understand it) each document is
> already only passed once to the view server, in order to be indexed by all
> the views in that design document.

I agree there is lots that can and should be done to speed up views that 
do indeed care about most of the docs - such views spend less time 
relatively in the json encode step and more time in the interpreter.  As 
an experiment, I "ported" one of our views that does look at most of the 
docs from javascript to erlangview, and the performance increase was far 
more modest (20% maybe).  I suspect the javascript interpreter is faster 
than erlang, so I suspect that there will be a level of view complexity 
where using javascript *increases* view performance over erlang, even 
when factoring in the json processing...

Cheers,

Mark

Re: View Filter

Posted by Brian Candler <B....@pobox.com>.
On Thu, May 14, 2009 at 09:53:14AM -0500, Zachary Zolton wrote:
> (1) people who are storing large documents in CouchDB but not indexing them
> at all (I guess this is possible, e.g. if the doc ids are well-known or
> stored in other documents, but this isn't the most common way of working)

The proposal would exclude a document from *all* views in a particular
design doc. So you're only going to get a benefit from this if you have a
large number of documents (or a number of large documents) which are not
required to be indexed in any view in that design doc.

> I do agree, though, that only being able to filter at the design doc
> level limits the utility of view filtering.

And it's reasonable, given that (as I understand it) each document is
already only passed once to the view server, in order to be indexed by all
the views in that design document.

> Given that a design doc is
> supposed to be an application's "view" of the database, would we want
> to encourage folks to make a different design doc for each type of
> data they store in the database? My gut says "one design doc per
> application" —but I could be all mixed up!

I have been ending up with views which run across *all* the documents in a
database - for example, a generic "search" box which lets the user type in a
search term and hit any matching type of object. Having a single design
document holding all my views means that each document only needs to be sent
once to the view server.

Regards,

Brian.

Re: View Filter

Posted by Jan Lehnardt <ja...@apache.org>.
On 14 May 2009, at 16:53, Zachary Zolton wrote:

> Please reiterate, Brian. I'm not quite sure what you're getting at  
> here:
>
> (1) people who are storing large documents in CouchDB but not  
> indexing them
> at all (I guess this is possible, e.g. if the doc ids are well-known  
> or
> stored in other documents, but this isn't the most common way of  
> working)
>
>
> I do agree, though, that only being able to filter at the design doc
> level limits the utility of view filtering. Given that a design doc is
> supposed to be an application's "view" of the database, would we want
> to encourage folks to make a different design doc for each type of
> data they store in the database?

No! :) The patch is only meant for "post-application". I.e. you create
views and design docs as it makes sense for your application and
then if, and only if, you end up in a situation where this patch gives
you speed, you use it. I don't want to encourage a "one view per
design doc" setup.


> My gut says "one design doc per
> application" —but I could be all mixed up!

Your gut is correct ;) But some apps might need more than one
design doc.

Cheers
Jan
--



>
> Cheers,
> Zach
>
>
> On Thu, May 14, 2009 at 2:43 AM, Brian Candler <B....@pobox.com>  
> wrote:
>> On Wed, May 13, 2009 at 09:41:29AM -0500, Zachary Zolton wrote:
>>> So, this sounds like a big win for those who like to store many
>>> document types in the same database with a "type descriminator"  
>>> field.
>>
>> ... but only if all views in the same design doc are filtered by  
>> the same
>> set of types. That is, you can only use it to exclude documents  
>> which are
>> not used by *any* view. Therefore the benefit is for:
>>
>> (1) people who are storing large documents in CouchDB but not  
>> indexing them
>> at all (I guess this is possible, e.g. if the doc ids are well- 
>> known or
>> stored in other documents, but this isn't the most common way of  
>> working)
>>
>> (2) people who have a separate design document for each "type" of  
>> object.
>> They would most likely get the same or better performance benefit  
>> by having
>> a single design document with all their views.
>>
>> I also think there are other pinch-points in view generation which  
>> need
>> working on, although perhaps they are not as quick wins as this one.
>>
>> For example, on my old Thinkpad X30 (mobile P3 1.2GHz), I can  
>> insert a set
>> of 1300 documents in ~2 secs using _bulk_docs. However the first view
>> request (generating ~6000 keys) takes around 35 seconds to respond.
>>
>> Regards,
>>
>> Brian.
>>
>


Re: View Filter

Posted by Zachary Zolton <za...@gmail.com>.
Please reiterate, Brian. I'm not quite sure what you're getting at here:

(1) people who are storing large documents in CouchDB but not indexing them
at all (I guess this is possible, e.g. if the doc ids are well-known or
stored in other documents, but this isn't the most common way of working)


I do agree, though, that only being able to filter at the design doc
level limits the utility of view filtering. Given that a design doc is
supposed to be an application's "view" of the database, would we want
to encourage folks to make a different design doc for each type of
data they store in the database? My gut says "one design doc per
application" —but I could be all mixed up!

Cheers,
Zach


On Thu, May 14, 2009 at 2:43 AM, Brian Candler <B....@pobox.com> wrote:
> On Wed, May 13, 2009 at 09:41:29AM -0500, Zachary Zolton wrote:
>> So, this sounds like a big win for those who like to store many
>> document types in the same database with a "type descriminator" field.
>
> ... but only if all views in the same design doc are filtered by the same
> set of types. That is, you can only use it to exclude documents which are
> not used by *any* view. Therefore the benefit is for:
>
> (1) people who are storing large documents in CouchDB but not indexing them
> at all (I guess this is possible, e.g. if the doc ids are well-known or
> stored in other documents, but this isn't the most common way of working)
>
> (2) people who have a separate design document for each "type" of object.
> They would most likely get the same or better performance benefit by having
> a single design document with all their views.
>
> I also think there are other pinch-points in view generation which need
> working on, although perhaps they are not as quick wins as this one.
>
> For example, on my old Thinkpad X30 (mobile P3 1.2GHz), I can insert a set
> of 1300 documents in ~2 secs using _bulk_docs. However the first view
> request (generating ~6000 keys) takes around 35 seconds to respond.
>
> Regards,
>
> Brian.
>

Re: View Filter

Posted by Brian Candler <B....@pobox.com>.
On Wed, May 13, 2009 at 09:41:29AM -0500, Zachary Zolton wrote:
> So, this sounds like a big win for those who like to store many
> document types in the same database with a "type descriminator" field.

... but only if all views in the same design doc are filtered by the same
set of types. That is, you can only use it to exclude documents which are
not used by *any* view. Therefore the benefit is for:

(1) people who are storing large documents in CouchDB but not indexing them
at all (I guess this is possible, e.g. if the doc ids are well-known or
stored in other documents, but this isn't the most common way of working)

(2) people who have a separate design document for each "type" of object.
They would most likely get the same or better performance benefit by having
a single design document with all their views.

I also think there are other pinch-points in view generation which need
working on, although perhaps they are not as quick wins as this one.

For example, on my old Thinkpad X30 (mobile P3 1.2GHz), I can insert a set
of 1300 documents in ~2 secs using _bulk_docs. However the first view
request (generating ~6000 keys) takes around 35 seconds to respond.

Regards,

Brian.

Re: View Filter

Posted by Paul Davis <pa...@gmail.com>.
On Wed, May 13, 2009 at 5:26 PM, kowsik <ko...@gmail.com> wrote:
> We had a thread a month or so ago about view server optimization where
> this came up. By having document 'classes', it's possible to have
> design views that only get applied to certain documents.
>
> While I like this, I can immediately see someone asking for AND
> instead of the OR (implemented below) in the filter. The next
> generalization will be not just top level attributes, but nested ones
> too.
>
> So the optimization really is that when there are multiple view
> functions in a single design document that all apply to the same
> document class, we want to reject early and not have to invoke the map
> function.
>
> One possible alternative is to have the filter implemented in the view
> server, but invoked once for each design document (potentially with
> multiple views). This allows the user to control what the filter
> predicate looks like and can implement any scheme that s/he likes.
>
> Thoughts?
>

My guess is that most of the speedup of filtering comes from avoiding
the serialization/deserialization dance, not the actual application of
JavaScript methods. Ie, once the view server is computing the filter,
its probably little to no savings.

HTH,
Paul

> K.
>
> On Wed, May 13, 2009 at 7:41 AM, Zachary Zolton
> <za...@gmail.com> wrote:
>> So, this sounds like a big win for those who like to store many
>> document types in the same database with a "type descriminator" field.
>>
>> It often takes me a a bit of thinking to decide whether or not to
>> store different document in the same database. So, I suppose a feature
>> like this would alleviate a possible negative side effect of that
>> choice, by reducing the time it takes to filter out the different
>> document types. Shooting from the hip, I'd say I like this feature!
>>
>> –Zach
>>
>> On Wed, May 13, 2009 at 9:08 AM, Jan Lehnardt <ja...@apache.org> wrote:
>>> Hi,
>>>
>>> I made views faster! :)
>>>
>>> I wrote a patch* that introduces the concept of a view filter.
>>> A new design doc option acts as a document filter and
>>> prevents a doc from getting serialized and sent to the view
>>> server. This is useful to avoid unnecessary computation
>>> when using views that use the `if(doc.type == "foo") {…`
>>> pattern.
>>>
>>> *
>>> http://github.com/janl/couchdb/commit/a47a4831db74e3e0400c6faaaa29984e10ac861c
>>>
>>> The filter works like this:
>>> {
>>>  _id:"_design/test_foo",
>>>  language: "javascript",
>>>  options: {
>>>    filter: [{type: "foo"}, {type: "bar"}] // oh hey, proplists in JSON! :)
>>>  },
>>>  views: {
>>>    all_docs: { // really, only all foo and bar docs
>>>      map: "function(doc) { emit(doc.integer, null); }"
>>>    }
>>>  }
>>> };
>>>
>>> If *any* of the `{field, value}`** objects match a *top level* field and
>>> value in a document, it gets sent to the view server.
>>>
>>> ** Yeah, `field` is not hardcoded to `type`, so if you use `class:"foo"`,
>>> this patch works for you.
>>>
>>> A few notes:
>>>
>>>  - It would be nice if we could extend this so…
>>>   No — I don't like to add any more bells an whistles to this as
>>>   eventually this will lead to pure Erlang views which we want
>>>   to get anyway.
>>>
>>>  - In the light of other view server improvements, this might prove
>>>   to gain only marginal speed.
>>>
>>>  - Can we have a filter per view, not a filter per design doc? — Not
>>>   without major reworking of the view server and with losing other
>>>   optimisations.
>>>
>>> I don't think I should just go and commit this without discussion. In
>>> fact, I'd opt to only include this patch if there's demand. I'm happy
>>> to maintain the patch outside of CouchDB for those who need that
>>> speedup.
>>>
>>> Cheers
>>> Jan
>>> --
>>>
>>>
>>
>

Re: View Filter

Posted by kowsik <ko...@gmail.com>.
We had a thread a month or so ago about view server optimization where
this came up. By having document 'classes', it's possible to have
design views that only get applied to certain documents.

While I like this, I can immediately see someone asking for AND
instead of the OR (implemented below) in the filter. The next
generalization will be not just top level attributes, but nested ones
too.

So the optimization really is that when there are multiple view
functions in a single design document that all apply to the same
document class, we want to reject early and not have to invoke the map
function.

One possible alternative is to have the filter implemented in the view
server, but invoked once for each design document (potentially with
multiple views). This allows the user to control what the filter
predicate looks like and can implement any scheme that s/he likes.

Thoughts?

K.

On Wed, May 13, 2009 at 7:41 AM, Zachary Zolton
<za...@gmail.com> wrote:
> So, this sounds like a big win for those who like to store many
> document types in the same database with a "type descriminator" field.
>
> It often takes me a a bit of thinking to decide whether or not to
> store different document in the same database. So, I suppose a feature
> like this would alleviate a possible negative side effect of that
> choice, by reducing the time it takes to filter out the different
> document types. Shooting from the hip, I'd say I like this feature!
>
> –Zach
>
> On Wed, May 13, 2009 at 9:08 AM, Jan Lehnardt <ja...@apache.org> wrote:
>> Hi,
>>
>> I made views faster! :)
>>
>> I wrote a patch* that introduces the concept of a view filter.
>> A new design doc option acts as a document filter and
>> prevents a doc from getting serialized and sent to the view
>> server. This is useful to avoid unnecessary computation
>> when using views that use the `if(doc.type == "foo") {…`
>> pattern.
>>
>> *
>> http://github.com/janl/couchdb/commit/a47a4831db74e3e0400c6faaaa29984e10ac861c
>>
>> The filter works like this:
>> {
>>  _id:"_design/test_foo",
>>  language: "javascript",
>>  options: {
>>    filter: [{type: "foo"}, {type: "bar"}] // oh hey, proplists in JSON! :)
>>  },
>>  views: {
>>    all_docs: { // really, only all foo and bar docs
>>      map: "function(doc) { emit(doc.integer, null); }"
>>    }
>>  }
>> };
>>
>> If *any* of the `{field, value}`** objects match a *top level* field and
>> value in a document, it gets sent to the view server.
>>
>> ** Yeah, `field` is not hardcoded to `type`, so if you use `class:"foo"`,
>> this patch works for you.
>>
>> A few notes:
>>
>>  - It would be nice if we could extend this so…
>>   No — I don't like to add any more bells an whistles to this as
>>   eventually this will lead to pure Erlang views which we want
>>   to get anyway.
>>
>>  - In the light of other view server improvements, this might prove
>>   to gain only marginal speed.
>>
>>  - Can we have a filter per view, not a filter per design doc? — Not
>>   without major reworking of the view server and with losing other
>>   optimisations.
>>
>> I don't think I should just go and commit this without discussion. In
>> fact, I'd opt to only include this patch if there's demand. I'm happy
>> to maintain the patch outside of CouchDB for those who need that
>> speedup.
>>
>> Cheers
>> Jan
>> --
>>
>>
>

Re: View Filter

Posted by Zachary Zolton <za...@gmail.com>.
So, this sounds like a big win for those who like to store many
document types in the same database with a "type descriminator" field.

It often takes me a a bit of thinking to decide whether or not to
store different document in the same database. So, I suppose a feature
like this would alleviate a possible negative side effect of that
choice, by reducing the time it takes to filter out the different
document types. Shooting from the hip, I'd say I like this feature!

–Zach

On Wed, May 13, 2009 at 9:08 AM, Jan Lehnardt <ja...@apache.org> wrote:
> Hi,
>
> I made views faster! :)
>
> I wrote a patch* that introduces the concept of a view filter.
> A new design doc option acts as a document filter and
> prevents a doc from getting serialized and sent to the view
> server. This is useful to avoid unnecessary computation
> when using views that use the `if(doc.type == "foo") {…`
> pattern.
>
> *
> http://github.com/janl/couchdb/commit/a47a4831db74e3e0400c6faaaa29984e10ac861c
>
> The filter works like this:
> {
>  _id:"_design/test_foo",
>  language: "javascript",
>  options: {
>    filter: [{type: "foo"}, {type: "bar"}] // oh hey, proplists in JSON! :)
>  },
>  views: {
>    all_docs: { // really, only all foo and bar docs
>      map: "function(doc) { emit(doc.integer, null); }"
>    }
>  }
> };
>
> If *any* of the `{field, value}`** objects match a *top level* field and
> value in a document, it gets sent to the view server.
>
> ** Yeah, `field` is not hardcoded to `type`, so if you use `class:"foo"`,
> this patch works for you.
>
> A few notes:
>
>  - It would be nice if we could extend this so…
>   No — I don't like to add any more bells an whistles to this as
>   eventually this will lead to pure Erlang views which we want
>   to get anyway.
>
>  - In the light of other view server improvements, this might prove
>   to gain only marginal speed.
>
>  - Can we have a filter per view, not a filter per design doc? — Not
>   without major reworking of the view server and with losing other
>   optimisations.
>
> I don't think I should just go and commit this without discussion. In
> fact, I'd opt to only include this patch if there's demand. I'm happy
> to maintain the patch outside of CouchDB for those who need that
> speedup.
>
> Cheers
> Jan
> --
>
>