You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Stanley Iriele <si...@gmail.com> on 2014/07/11 00:44:25 UTC

Control view query performance

When a doc needs to be calculated for a particular set of entries from the
changes feed...are docs sent one at a time or in batches?... And is there
just one view server doing all of the computation.

Lastly...is there a way to configure or control any of the settings that
would dictate the above?..like number of view servers.. Or batch size for
docs.... Etc... And if not should there be?.. But I've been very curious
about this for a while... I looked a few view server implementations..
But...meh

Re: Control view query performance

Posted by Stanley Iriele <si...@gmail.com>.
Interesting....does the native view server have the same relationship with
couchdb? And by that I mean is there a pool of processes that read db files
from disk or is its relationship completely different?.

Also...can..and should the number of spun up processes be configurable?

By the way many thanks Robert/Jens this is spot on...  exactly what I was
looking for! ...

Lastly a lot of you guy's thorough explanations about these sorts of this
make for an excellent FAQ section in the docs to come
On Jul 14, 2014 11:51 AM, "Jens Alfke" <je...@couchbase.com> wrote:

>
> On Jul 14, 2014, at 11:29 AM, I wrote:
>
> > Really? I thought that the map function would be run in parallel on
> several documents at once. Seems like an obvious way to speed up view
> updates; after all, map/reduce as popularized by Google is intended to be a
> massively parallel algorithm…)
>
> Sorry, I realize I’ve fallen prey to the “any problem I haven’t worked on
> must be trivial” attitude, a common pitfall of engineers. :-p
>
> I’m sure there are good reasons this hasn’t been implemented yet,
> including the fact that JSVMs are single-threaded so you’d have to spin up
> several of them to be able to run map functions in parallel.
> But it would be a good optimization at some point, considering the large
> number of CPU cores that today’s server boxes have.
>
> —Jens

Re: Control view query performance

Posted by Jens Alfke <je...@couchbase.com>.
On Jul 14, 2014, at 11:29 AM, I wrote:

> Really? I thought that the map function would be run in parallel on several documents at once. Seems like an obvious way to speed up view updates; after all, map/reduce as popularized by Google is intended to be a massively parallel algorithm…)

Sorry, I realize I’ve fallen prey to the “any problem I haven’t worked on must be trivial” attitude, a common pitfall of engineers. :-p

I’m sure there are good reasons this hasn’t been implemented yet, including the fact that JSVMs are single-threaded so you’d have to spin up several of them to be able to run map functions in parallel.
But it would be a good optimization at some point, considering the large number of CPU cores that today’s server boxes have.

—Jens

Re: Control view query performance

Posted by Jens Alfke <je...@couchbase.com>.
On Jul 14, 2014, at 4:16 AM, Robert Samuel Newson <rn...@apache.org> wrote:

> A single view group (all views in the same ddoc) is indexed serially using a single couchjs process (aka, there is no parallelism here).

Really? I thought that the map function would be run in parallel on several documents at once. Seems like an obvious way to speed up view updates; after all, map/reduce as popularized by Google is intended to be a massively parallel algorithm…)

—Jens


Re: Control view query performance

Posted by Robert Samuel Newson <rn...@apache.org>.
CouchDB manages a pool of couchjs processes, any one of which is capable of evaluating Javascript map/reduce functions.

A single view group (all views in the same ddoc) is indexed serially using a single couchjs process (aka, there is no parallelism here). In CouchDB 2.0, of course, a database can, and typically does, consist of multiple shards, introducing parallelism to view builds among other effects.

B.


On 14 Jul 2014, at 05:54, Stanley Iriele <si...@gmail.com> wrote:

> firstly..Thanks for the,I answer appreciate that. I was in a hurry while
> writing the past email
> 
> To be slightly clearer. I know that the _changes feed is generated from
> some bookkeeping mechanism internal to the database. When I said
> "Calculated for a particular set of entries" I mean the actual javascript
> view server that is evaluating your map function because the doc(s) have
> been fed to it and it must response with the emitted results. I ..could
> have phrased that better.
> 
> My questions revolve around how CouchDB utilizes the view server...and if
> there is only one of them running at a given time on a box.The docs suggest
> that CouchDB talks to a view server with a line bases protocol, but I don't
> know there can be more than one view server executing map functions at a
> time...and if so...is that a configurable thing?
> 
> 
> On Sun, Jul 13, 2014 at 9:33 PM, Jens Alfke <je...@couchbase.com> wrote:
> 
>> I am no expert on the implementation of CouchDB, but I don’t believe your
>> question makes sense in the form that you asked it. The changes feed isn’t
>> related to views at all. Rather, there’s an underlying by-sequence index in
>> the main database that’s scanned to generate that feed. And it doesn’t
>> “calculate” docs; it just returns existing doc IDs and revision IDs.
>> 
>> If that’s not what you were actually asking, you’ll need to rephrase your
>> question more clearly.
>> 
>> —Jens
>> 
>> On Jul 13, 2014, at 8:49 PM, Stanley Iriele <si...@gmail.com> wrote:
>> 
>>> any thoughts on this?...Its not a blocker..I'd just really like to know
>>> 
>>> 
>>> On Thu, Jul 10, 2014 at 3:44 PM, Stanley Iriele <si...@gmail.com>
>>> wrote:
>>> 
>>>> When a doc needs to be calculated for a particular set of entries from
>> the
>>>> changes feed...are docs sent one at a time or in batches?... And is
>> there
>>>> just one view server doing all of the computation.
>>>> 
>>>> Lastly...is there a way to configure or control any of the settings that
>>>> would dictate the above?..like number of view servers.. Or batch size
>> for
>>>> docs.... Etc... And if not should there be?.. But I've been very curious
>>>> about this for a while... I looked a few view server implementations..
>>>> But...meh
>>>> 
>> 
>> 


Re: Control view query performance

Posted by Stanley Iriele <si...@gmail.com>.
firstly..Thanks for the,I answer appreciate that. I was in a hurry while
writing the past email

To be slightly clearer. I know that the _changes feed is generated from
 some bookkeeping mechanism internal to the database. When I said
"Calculated for a particular set of entries" I mean the actual javascript
view server that is evaluating your map function because the doc(s) have
been fed to it and it must response with the emitted results. I ..could
have phrased that better.

My questions revolve around how CouchDB utilizes the view server...and if
there is only one of them running at a given time on a box.The docs suggest
that CouchDB talks to a view server with a line bases protocol, but I don't
know there can be more than one view server executing map functions at a
time...and if so...is that a configurable thing?


On Sun, Jul 13, 2014 at 9:33 PM, Jens Alfke <je...@couchbase.com> wrote:

> I am no expert on the implementation of CouchDB, but I don’t believe your
> question makes sense in the form that you asked it. The changes feed isn’t
> related to views at all. Rather, there’s an underlying by-sequence index in
> the main database that’s scanned to generate that feed. And it doesn’t
> “calculate” docs; it just returns existing doc IDs and revision IDs.
>
> If that’s not what you were actually asking, you’ll need to rephrase your
> question more clearly.
>
> —Jens
>
> On Jul 13, 2014, at 8:49 PM, Stanley Iriele <si...@gmail.com> wrote:
>
> > any thoughts on this?...Its not a blocker..I'd just really like to know
> >
> >
> > On Thu, Jul 10, 2014 at 3:44 PM, Stanley Iriele <si...@gmail.com>
> > wrote:
> >
> >> When a doc needs to be calculated for a particular set of entries from
> the
> >> changes feed...are docs sent one at a time or in batches?... And is
> there
> >> just one view server doing all of the computation.
> >>
> >> Lastly...is there a way to configure or control any of the settings that
> >> would dictate the above?..like number of view servers.. Or batch size
> for
> >> docs.... Etc... And if not should there be?.. But I've been very curious
> >> about this for a while... I looked a few view server implementations..
> >> But...meh
> >>
>
>

Re: Control view query performance

Posted by Jens Alfke <je...@couchbase.com>.
I am no expert on the implementation of CouchDB, but I don’t believe your question makes sense in the form that you asked it. The changes feed isn’t related to views at all. Rather, there’s an underlying by-sequence index in the main database that’s scanned to generate that feed. And it doesn’t “calculate” docs; it just returns existing doc IDs and revision IDs.

If that’s not what you were actually asking, you’ll need to rephrase your question more clearly.

—Jens

On Jul 13, 2014, at 8:49 PM, Stanley Iriele <si...@gmail.com> wrote:

> any thoughts on this?...Its not a blocker..I'd just really like to know
> 
> 
> On Thu, Jul 10, 2014 at 3:44 PM, Stanley Iriele <si...@gmail.com>
> wrote:
> 
>> When a doc needs to be calculated for a particular set of entries from the
>> changes feed...are docs sent one at a time or in batches?... And is there
>> just one view server doing all of the computation.
>> 
>> Lastly...is there a way to configure or control any of the settings that
>> would dictate the above?..like number of view servers.. Or batch size for
>> docs.... Etc... And if not should there be?.. But I've been very curious
>> about this for a while... I looked a few view server implementations..
>> But...meh
>> 


Re: Control view query performance

Posted by Stanley Iriele <si...@gmail.com>.
any thoughts on this?...Its not a blocker..I'd just really like to know


On Thu, Jul 10, 2014 at 3:44 PM, Stanley Iriele <si...@gmail.com>
wrote:

> When a doc needs to be calculated for a particular set of entries from the
> changes feed...are docs sent one at a time or in batches?... And is there
> just one view server doing all of the computation.
>
> Lastly...is there a way to configure or control any of the settings that
> would dictate the above?..like number of view servers.. Or batch size for
> docs.... Etc... And if not should there be?.. But I've been very curious
> about this for a while... I looked a few view server implementations..
> But...meh
>