You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Wout Mertens <wm...@cisco.com> on 2009/02/11 19:28:34 UTC

Re: [user] Thoughts on document/views...

On Feb 11, 2009, at 7:22 PM, kowsik wrote:

> Just something that occurred to me and wanted to run it by you guys.
> For pcapr, I have a number of different types of documents in
> couch-db, some are comments, some are about the packets, etc. Now, I
> have views that do something like this:
>
> map: function(doc) {
>    if (doc.type == 'comment') emit(...);
> }
>
> With a large set of documents and a large set of views, any new
> document or updates to document is passed to __all__ of the views
> (when the view is eventually invoked). But I "know" that my documents
> come in classes and that only certain views really apply to them. I'm
> thinking of a view as a static method on a class that gets some
> information about the instances.
>
> What I'm getting at is, does it make sense to have some type of
> document "class" attribute and then have views bound to these classes?
> The goal, of course, being that couch-db can pre-filter a lot of these
> things and only run the views for the appropriate types of documents.

I'm probably speaking before my turn being such a newbie, but why  
wouldn't you create a new database for disjoint classes of documents?

Also, once a view is run on a document, it doesn't get re-run that  
often does it?

So other than eating diskspace, maybe there's not really anything  
wrong with keeping everything in one db.

Wout.

Re: [user] Thoughts on document/views...

Posted by Paul Davis <pa...@gmail.com>.

On Wed, Feb 11, 2009 at 1:28 PM, Wout Mertens <wm...@cisco.com> wrote:
> On Feb 11, 2009, at 7:22 PM, kowsik wrote:
>
>> Just something that occurred to me and wanted to run it by you guys.
>> For pcapr, I have a number of different types of documents in
>> couch-db, some are comments, some are about the packets, etc. Now, I
>> have views that do something like this:
>>
>> map: function(doc) {
>>   if (doc.type == 'comment') emit(...);
>> }
>>
>> With a large set of documents and a large set of views, any new
>> document or updates to document is passed to __all__ of the views
>> (when the view is eventually invoked). But I "know" that my documents
>> come in classes and that only certain views really apply to them. I'm
>> thinking of a view as a static method on a class that gets some
>> information about the instances.
>>
>> What I'm getting at is, does it make sense to have some type of
>> document "class" attribute and then have views bound to these classes?
>> The goal, of course, being that couch-db can pre-filter a lot of these
>> things and only run the views for the appropriate types of documents.
>
> I'm probably speaking before my turn being such a newbie, but why wouldn't
> you create a new database for disjoint classes of documents?
>
> Also, once a view is run on a document, it doesn't get re-run that often
> does it?
>
> So other than eating diskspace, maybe there's not really anything wrong with
> keeping everything in one db.
>
> Wout.
>
>

There's nothing at all wrong with keeping lots of multiple document
types in a single database. 1 db != 1 table etc etc.

When a view is rerun it is only executed once for each document that
changed. It doesn't matter how many times the document changed either.

HTH,
Paul Davis

Re: [user] Thoughts on document/views...

Posted by Paul Davis <pa...@gmail.com>.

On Wed, Feb 11, 2009 at 1:28 PM, Wout Mertens <wm...@cisco.com> wrote:
> On Feb 11, 2009, at 7:22 PM, kowsik wrote:
>
>> Just something that occurred to me and wanted to run it by you guys.
>> For pcapr, I have a number of different types of documents in
>> couch-db, some are comments, some are about the packets, etc. Now, I
>> have views that do something like this:
>>
>> map: function(doc) {
>>   if (doc.type == 'comment') emit(...);
>> }
>>
>> With a large set of documents and a large set of views, any new
>> document or updates to document is passed to __all__ of the views
>> (when the view is eventually invoked). But I "know" that my documents
>> come in classes and that only certain views really apply to them. I'm
>> thinking of a view as a static method on a class that gets some
>> information about the instances.
>>
>> What I'm getting at is, does it make sense to have some type of
>> document "class" attribute and then have views bound to these classes?
>> The goal, of course, being that couch-db can pre-filter a lot of these
>> things and only run the views for the appropriate types of documents.
>
> I'm probably speaking before my turn being such a newbie, but why wouldn't
> you create a new database for disjoint classes of documents?
>
> Also, once a view is run on a document, it doesn't get re-run that often
> does it?
>
> So other than eating diskspace, maybe there's not really anything wrong with
> keeping everything in one db.
>
> Wout.
>
>

There's nothing at all wrong with keeping lots of multiple document
types in a single database. 1 db != 1 table etc etc.

When a view is rerun it is only executed once for each document that
changed. It doesn't matter how many times the document changed either.

HTH,
Paul Davis

Re: [user] Thoughts on document/views...

Posted by Dean Landolt <de...@deanlandolt.com>.

On Wed, Feb 11, 2009 at 4:45 PM, Damien Katz <da...@apache.org> wrote:

>
> On Feb 11, 2009, at 4:29 PM, Chris Anderson wrote:
>
>  On Wed, Feb 11, 2009 at 10:28 AM, Wout Mertens <wm...@cisco.com>
>> wrote:
>>
>>> On Feb 11, 2009, at 7:22 PM, kowsik wrote:
>>>
>>>
>>>> What I'm getting at is, does it make sense to have some type of
>>>> document "class" attribute and then have views bound to these classes?
>>>> The goal, of course, being that couch-db can pre-filter a lot of these
>>>> things and only run the views for the appropriate types of documents.
>>>>
>>>
>>> I'm probably speaking before my turn being such a newbie, but why
>>> wouldn't
>>> you create a new database for disjoint classes of documents?
>>>
>>
>> That's the basic rule of thumb: Document only need to be in the same
>> database with each other, if they need to be run through the same
>> views (or they need to be replicated together).
>>
>> If you have a db with millions of records, but only a handful of them
>> are interesting to to your views, you will save a lot of serialization
>> by putting the viewed documents in their own database.
>>
>>
>>> Also, once a view is run on a document, it doesn't get re-run that often
>>> does it?
>>>
>>
>> Views are run one per document update, so if you write the document
>> once, it only gets run through the view server once.
>>
>> If you have say, 5 different document "classes" and you have views
>> that care about each of them, if you put those views in the same
>> design document, then you will not have extra serialization hits. Each
>> document update is sent to the view server once per design document.
>>
>>
>>> So other than eating diskspace, maybe there's not really anything wrong
>>> with
>>> keeping everything in one db.
>>>
>>
>> My recommendation is to not worry about it, unless you have like 75%
>> or more docs that don't show up in any view. In that case, those docs
>> might better off in another database.
>>
>
> I agree with Chris. The way view indexes are updated en-masse, it makes
> this mostly a non-issue.


Antony did bring up an interesting (perhaps inevitable) optimization though
-- in light of jchris' comments the guards won't do much good, but it'd make
an interesting (post 1.0, obviously) addition to the erlang api to be able
bypass view server i/o and json roundtripping. I probably wouldn't ever need
it, and at first I'd gag a little when I cracked open someone else's erlang
design doc, but it could lower the penalty for having a number of boutique
third-party design docs in a database.

Re: [user] Thoughts on document/views...

Posted by Damien Katz <da...@apache.org>.

On Feb 11, 2009, at 4:29 PM, Chris Anderson wrote:

> On Wed, Feb 11, 2009 at 10:28 AM, Wout Mertens <wm...@cisco.com>  
> wrote:
>> On Feb 11, 2009, at 7:22 PM, kowsik wrote:
>>
>>>
>>> What I'm getting at is, does it make sense to have some type of
>>> document "class" attribute and then have views bound to these  
>>> classes?
>>> The goal, of course, being that couch-db can pre-filter a lot of  
>>> these
>>> things and only run the views for the appropriate types of  
>>> documents.
>>
>> I'm probably speaking before my turn being such a newbie, but why  
>> wouldn't
>> you create a new database for disjoint classes of documents?
>
> That's the basic rule of thumb: Document only need to be in the same
> database with each other, if they need to be run through the same
> views (or they need to be replicated together).
>
> If you have a db with millions of records, but only a handful of them
> are interesting to to your views, you will save a lot of serialization
> by putting the viewed documents in their own database.
>
>>
>> Also, once a view is run on a document, it doesn't get re-run that  
>> often
>> does it?
>
> Views are run one per document update, so if you write the document
> once, it only gets run through the view server once.
>
> If you have say, 5 different document "classes" and you have views
> that care about each of them, if you put those views in the same
> design document, then you will not have extra serialization hits. Each
> document update is sent to the view server once per design document.
>
>>
>> So other than eating diskspace, maybe there's not really anything  
>> wrong with
>> keeping everything in one db.
>
> My recommendation is to not worry about it, unless you have like 75%
> or more docs that don't show up in any view. In that case, those docs
> might better off in another database.

I agree with Chris. The way view indexes are updated en-masse, it  
makes this mostly a non-issue.

-Damien

Re: [user] Thoughts on document/views...

Posted by Chris Anderson <jc...@apache.org>.

On Wed, Feb 11, 2009 at 10:28 AM, Wout Mertens <wm...@cisco.com> wrote:
> On Feb 11, 2009, at 7:22 PM, kowsik wrote:
>
>>
>> What I'm getting at is, does it make sense to have some type of
>> document "class" attribute and then have views bound to these classes?
>> The goal, of course, being that couch-db can pre-filter a lot of these
>> things and only run the views for the appropriate types of documents.
>
> I'm probably speaking before my turn being such a newbie, but why wouldn't
> you create a new database for disjoint classes of documents?

That's the basic rule of thumb: Document only need to be in the same
database with each other, if they need to be run through the same
views (or they need to be replicated together).

If you have a db with millions of records, but only a handful of them
are interesting to to your views, you will save a lot of serialization
by putting the viewed documents in their own database.

>
> Also, once a view is run on a document, it doesn't get re-run that often
> does it?

Views are run one per document update, so if you write the document
once, it only gets run through the view server once.

If you have say, 5 different document "classes" and you have views
that care about each of them, if you put those views in the same
design document, then you will not have extra serialization hits. Each
document update is sent to the view server once per design document.

>
> So other than eating diskspace, maybe there's not really anything wrong with
> keeping everything in one db.

My recommendation is to not worry about it, unless you have like 75%
or more docs that don't show up in any view. In that case, those docs
might better off in another database.

Chris

-- 
Chris Anderson
http://jchris.mfdz.com