You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Julian Moritz <ma...@julianmoritz.de> on 2010/04/05 18:14:02 UTC

performance issues

Hi,

I've developed a (in my eyes) simple view. I have a wordlist which does
not  contain unique words. I want to store it in a view, with every word
occurring once and ordered by random. Therefore I have a simple view
function:

function(doc){
emit([hash(doc.word), doc.word], null);
}

and a simple reduce:

function(key, values, rereduce){
return true;
}

calling that view with group=true it does what I want.

When storing plenty of words to the database, one of my two cpu cores is
used completely by couchjs.

Isn't the view built using two (or all) cpu cores? I thought (obviously
I'm wrong) that it would be calculated in parallel and using a
quadro-core (or more cores) would make storing faster.

Is there a solution for that? Should I use another query-server?

Regards
Julian

Re: performance issues

Posted by Julian Moritz <ma...@julianmoritz.de>.

Hi,

Julian Moritz schrieb:
> Hi,
> 

I've reimplemented my views in Python (like it way more than
javascript), and my computer is relaxed again.

ps aux | grep couchpy
couchdb   2752  0.3  0.2  10536  7184 ?        Ss   16:05   0:01
/usr/bin/python /usr/local/bin/couchpy
couchdb   2753  0.5  0.2  10084  6712 ?        Ss   16:05   0:02
/usr/bin/python /usr/local/bin/couchpy

the float numbers are cpu and ram usage. so maybe the main problem is
that I cannot use spidermonkey.

best regards
Julian

> Adam Kocoloski schrieb:
>> On Apr 5, 2010, at 2:52 PM, Julian Moritz <ma...@julianmoritz.de> wrote:
>>
>>
>> Hi Julian, it is still true that CouchDB will use only one couchjs
>> process for all the map functions in a single design doc. It uses a
>> second couchjs for the reduce functions, and of course separate design
>> docs get their own processes as well.
>>
>> In my experience simple view indexing was almost always limited by the
>> Erlang VM, so parallelizing was premature. If you've got a modern
>> SpiderMonkey and you're still CPU limited perhaps that's no longer the
>> case.  Can you remind us of the Couch and SM versions here?
>>
>> Adam
> 
> I'm not using any version of spider monkey. If I'd install spidermonkey
> on my ubuntu laptop, I cannot use most of the programs I do need (e.g.
> firefox and eclipse).
> 
> I've configured couchdb (version 0.11) like this:
> 
> --with-js-lib=/usr/lib/xulrunner-devel-1.9.1.8/lib/
> --with-js-include=/usr/lib/xulrunner-devel-1.9.1.8/include
> 
> Best Regards
> Julian
> 
>>>> I've developed a (in my eyes) simple view. I have a wordlist which does
>>>> not  contain unique words. I want to store it in a view, with every word
>>>> occurring once and ordered by random. Therefore I have a simple view
>>>> function:
>>>>
>>>> function(doc){
>>>> emit([hash(doc.word), doc.word], null);
>>>> }
>>>>
>>>> and a simple reduce:
>>>>
>>>> function(key, values, rereduce){
>>>> return true;
>>>> }
>>>>
>>>> calling that view with group=true it does what I want.
>>>>
>>>> When storing plenty of words to the database, one of my two cpu cores is
>>>> used completely by couchjs.
>>>>
>>>> Isn't the view built using two (or all) cpu cores? I thought (obviously
>>>> I'm wrong) that it would be calculated in parallel and using a
>>>> quadro-core (or more cores) would make storing faster.
>>>>
>>>> Is there a solution for that? Should I use another query-server?
>>>>
>>>> Regards
>>>> Julian
>>>>
>

Re: performance issues

Posted by Julian Moritz <ma...@julianmoritz.de>.

Hi,

Adam Kocoloski schrieb:
> 
> On Apr 5, 2010, at 2:52 PM, Julian Moritz <ma...@julianmoritz.de> wrote:
> 
> 
> Hi Julian, it is still true that CouchDB will use only one couchjs
> process for all the map functions in a single design doc. It uses a
> second couchjs for the reduce functions, and of course separate design
> docs get their own processes as well.
> 
> In my experience simple view indexing was almost always limited by the
> Erlang VM, so parallelizing was premature. If you've got a modern
> SpiderMonkey and you're still CPU limited perhaps that's no longer the
> case.  Can you remind us of the Couch and SM versions here?
> 
> Adam

I'm not using any version of spider monkey. If I'd install spidermonkey
on my ubuntu laptop, I cannot use most of the programs I do need (e.g.
firefox and eclipse).

I've configured couchdb (version 0.11) like this:

--with-js-lib=/usr/lib/xulrunner-devel-1.9.1.8/lib/
--with-js-include=/usr/lib/xulrunner-devel-1.9.1.8/include

Best Regards
Julian

> 
>>
>>> I've developed a (in my eyes) simple view. I have a wordlist which does
>>> not  contain unique words. I want to store it in a view, with every word
>>> occurring once and ordered by random. Therefore I have a simple view
>>> function:
>>>
>>> function(doc){
>>> emit([hash(doc.word), doc.word], null);
>>> }
>>>
>>> and a simple reduce:
>>>
>>> function(key, values, rereduce){
>>> return true;
>>> }
>>>
>>> calling that view with group=true it does what I want.
>>>
>>> When storing plenty of words to the database, one of my two cpu cores is
>>> used completely by couchjs.
>>>
>>> Isn't the view built using two (or all) cpu cores? I thought (obviously
>>> I'm wrong) that it would be calculated in parallel and using a
>>> quadro-core (or more cores) would make storing faster.
>>>
>>> Is there a solution for that? Should I use another query-server?
>>>
>>> Regards
>>> Julian
>>>
>

Re: performance issues

Posted by Adam Kocoloski <ad...@gmail.com>.

On Apr 5, 2010, at 2:52 PM, Julian Moritz <ma...@julianmoritz.de>  
wrote:

> Hi,
>
> Julian Moritz schrieb:
>> Hi,
>>
>
> I've just found this via google:
>
>>> We don't parallelize view index creation yet, so this is not an
>>> additional problem for you. You can however build two views in
>>> parallel and make use of two cores that way.
>>
>
> If this is (still) true, view index creation is the bottleneck of my
> application. Hence I'm just playing around and yet using 100% of my
> core, I cannot use CouchDB anymore.
>
> Regards
> Julian

Hi Julian, it is still true that CouchDB will use only one couchjs  
process for all the map functions in a single design doc. It uses a  
second couchjs for the reduce functions, and of course separate design  
docs get their own processes as well.

In my experience simple view indexing was almost always limited by the  
Erlang VM, so parallelizing was premature. If you've got a modern  
SpiderMonkey and you're still CPU limited perhaps that's no longer the  
case.  Can you remind us of the Couch and SM versions here?

Adam

>
>> I've developed a (in my eyes) simple view. I have a wordlist which  
>> does
>> not  contain unique words. I want to store it in a view, with every  
>> word
>> occurring once and ordered by random. Therefore I have a simple view
>> function:
>>
>> function(doc){
>> emit([hash(doc.word), doc.word], null);
>> }
>>
>> and a simple reduce:
>>
>> function(key, values, rereduce){
>> return true;
>> }
>>
>> calling that view with group=true it does what I want.
>>
>> When storing plenty of words to the database, one of my two cpu  
>> cores is
>> used completely by couchjs.
>>
>> Isn't the view built using two (or all) cpu cores? I thought  
>> (obviously
>> I'm wrong) that it would be calculated in parallel and using a
>> quadro-core (or more cores) would make storing faster.
>>
>> Is there a solution for that? Should I use another query-server?
>>
>> Regards
>> Julian
>>

Re: performance issues

Posted by Robert Newson <ro...@gmail.com>.

On reflection, I (partially) retract that. It works for the default
group_level setting so it implicitly does what you need. A reduce that
ignores all the input parameters is going behave oddly for different
group_level settings.

On Mon, Apr 5, 2010 at 9:03 PM, Robert Newson <ro...@gmail.com> wrote:
> I don't think your reduce is making the results unique. Rather, it's
> non-deterministically discarding rows. Where couchdb calls the reduce
> method, all of the input rows it's selected (outside of your control)
> are reduced to 'true'. I think it just appears to be working but
> isn't.
>
> Further, I don't think a reduce in couchdb can make a views entries
> unique even in principle.
>
> B.
>
> On Mon, Apr 5, 2010 at 8:19 PM, Julian Moritz <ma...@julianmoritz.de> wrote:
>> Hi J Chris,
>>
>> J Chris Anderson schrieb:
>>> On Apr 5, 2010, at 11:52 AM, Julian Moritz wrote:
>>>
>>>> Hi,
>>>>
>>>> Julian Moritz schrieb:
>>>>> Hi,
>>>>>
>>>> I've just found this via google:
>>>>
>>>>>> We don't parallelize view index creation yet, so this is not an
>>>>>> additional problem for you. You can however build two views in
>>>>>> parallel and make use of two cores that way.
>>>> If this is (still) true, view index creation is the bottleneck of my
>>>> application. Hence I'm just playing around and yet using 100% of my
>>>> core, I cannot use CouchDB anymore.
>>>>
>>>
>>> We rarely see view generation that is actually limited by view-function execution speed. The majority of the time the actual bottleneck is disk IO. To parallelize view generation the best option is to run a CouchDB-Lounge cluster.
>>>
>>
>> Hm, at the moment I have access to two computers. This isn't what you
>> mean with a "couchdb-lounge cluster", right?
>>
>>> It looks like you might be better of removing your reduce function, which might also speed things up.
>>>
>>
>> But I need it for making my list unique. This is an important feature
>> for my application.
>>
>> Thanks, I'll think about how to set up a couchdb cluster and do more
>> testing.
>>
>> Regards
>> Julian
>>
>>> Chris
>>>
>>>
>>>> Regards
>>>> Julian
>>>>
>>>>> I've developed a (in my eyes) simple view. I have a wordlist which does
>>>>> not  contain unique words. I want to store it in a view, with every word
>>>>> occurring once and ordered by random. Therefore I have a simple view
>>>>> function:
>>>>>
>>>>> function(doc){
>>>>> emit([hash(doc.word), doc.word], null);
>>>>> }
>>>>>
>>>>> and a simple reduce:
>>>>>
>>>>> function(key, values, rereduce){
>>>>> return true;
>>>>> }
>>>>>
>>>>> calling that view with group=true it does what I want.
>>>>>
>>>>> When storing plenty of words to the database, one of my two cpu cores is
>>>>> used completely by couchjs.
>>>>>
>>>>> Isn't the view built using two (or all) cpu cores? I thought (obviously
>>>>> I'm wrong) that it would be calculated in parallel and using a
>>>>> quadro-core (or more cores) would make storing faster.
>>>>>
>>>>> Is there a solution for that? Should I use another query-server?
>>>>>
>>>>> Regards
>>>>> Julian
>>>>>
>>>
>>>
>>
>

Re: performance issues

Posted by Robert Newson <ro...@gmail.com>.

I don't think your reduce is making the results unique. Rather, it's
non-deterministically discarding rows. Where couchdb calls the reduce
method, all of the input rows it's selected (outside of your control)
are reduced to 'true'. I think it just appears to be working but
isn't.

Further, I don't think a reduce in couchdb can make a views entries
unique even in principle.

B.

On Mon, Apr 5, 2010 at 8:19 PM, Julian Moritz <ma...@julianmoritz.de> wrote:
> Hi J Chris,
>
> J Chris Anderson schrieb:
>> On Apr 5, 2010, at 11:52 AM, Julian Moritz wrote:
>>
>>> Hi,
>>>
>>> Julian Moritz schrieb:
>>>> Hi,
>>>>
>>> I've just found this via google:
>>>
>>>>> We don't parallelize view index creation yet, so this is not an
>>>>> additional problem for you. You can however build two views in
>>>>> parallel and make use of two cores that way.
>>> If this is (still) true, view index creation is the bottleneck of my
>>> application. Hence I'm just playing around and yet using 100% of my
>>> core, I cannot use CouchDB anymore.
>>>
>>
>> We rarely see view generation that is actually limited by view-function execution speed. The majority of the time the actual bottleneck is disk IO. To parallelize view generation the best option is to run a CouchDB-Lounge cluster.
>>
>
> Hm, at the moment I have access to two computers. This isn't what you
> mean with a "couchdb-lounge cluster", right?
>
>> It looks like you might be better of removing your reduce function, which might also speed things up.
>>
>
> But I need it for making my list unique. This is an important feature
> for my application.
>
> Thanks, I'll think about how to set up a couchdb cluster and do more
> testing.
>
> Regards
> Julian
>
>> Chris
>>
>>
>>> Regards
>>> Julian
>>>
>>>> I've developed a (in my eyes) simple view. I have a wordlist which does
>>>> not  contain unique words. I want to store it in a view, with every word
>>>> occurring once and ordered by random. Therefore I have a simple view
>>>> function:
>>>>
>>>> function(doc){
>>>> emit([hash(doc.word), doc.word], null);
>>>> }
>>>>
>>>> and a simple reduce:
>>>>
>>>> function(key, values, rereduce){
>>>> return true;
>>>> }
>>>>
>>>> calling that view with group=true it does what I want.
>>>>
>>>> When storing plenty of words to the database, one of my two cpu cores is
>>>> used completely by couchjs.
>>>>
>>>> Isn't the view built using two (or all) cpu cores? I thought (obviously
>>>> I'm wrong) that it would be calculated in parallel and using a
>>>> quadro-core (or more cores) would make storing faster.
>>>>
>>>> Is there a solution for that? Should I use another query-server?
>>>>
>>>> Regards
>>>> Julian
>>>>
>>
>>
>

Re: performance issues

Posted by Julian Moritz <ma...@julianmoritz.de>.

Hi,

Jan Lehnardt schrieb:
>
>> my reduce-function does a `return True` atm. Would it really be a speed-up?
>
> Not using JavaScript will be faster than using JavaScript. Feel free to write your
> views in Erlang completely :) "_sum" gives you a convenient middle-ground.

got that erlang-book in my bookshelf, but I'm faster developing with
python. Well, I'm just playing around and testing.

Best regards
Julian

> 
>> I just did what you are writing on p 185 in "the definitive guide".
> 
> Yeah, I wrote that, but read on :)
> 
> Cheers
> Jan
> --
> 
>> regards
>> Julian
>>
>>> Cheers
>>> Jan
>>> --
>>>
>>>> J Chris Anderson schrieb:
>>>>> On Apr 5, 2010, at 12:19 PM, Julian Moritz wrote:
>>>>>
>>>>>> But I need it for making my list unique. This is an important feature
>>>>>> for my application.
>>>>> This is probably explains the slowness. When you do a group=true query, CouchDB has to run the reduce function once for each unique key (serializing all the rows in the key to the JS process, and parsing the results.)
>>>>>
>>>>> I haven't tested this, but you might get better response throughput by dropping the reduce function and using a _list which only sends one row of output each time the key changes. This will avoid some additional Erlang processing of the result.
>>>>>
>>>>> Some documentation for _list is here:
>>>>>
>>>>> http://books.couchdb.org/relax/design-documents/lists
>>>> Thank you, I will have a look on it.
>>>>
>>>> Regards
>>>> Julian
>>>>
>>>>>> Thanks, I'll think about how to set up a couchdb cluster and do more
>>>>>> testing.
>>>>>>
>>>>>> Regards
>>>>>> Julian
>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>>
>>>>>>>> Regards
>>>>>>>> Julian
>>>>>>>>
>>>>>>>>> I've developed a (in my eyes) simple view. I have a wordlist which does
>>>>>>>>> not  contain unique words. I want to store it in a view, with every word
>>>>>>>>> occurring once and ordered by random. Therefore I have a simple view
>>>>>>>>> function:
>>>>>>>>>
>>>>>>>>> function(doc){
>>>>>>>>> emit([hash(doc.word), doc.word], null);
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> and a simple reduce:
>>>>>>>>>
>>>>>>>>> function(key, values, rereduce){
>>>>>>>>> return true;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> calling that view with group=true it does what I want.
>>>>>>>>>
>>>>>>>>> When storing plenty of words to the database, one of my two cpu cores is
>>>>>>>>> used completely by couchjs.
>>>>>>>>>
>>>>>>>>> Isn't the view built using two (or all) cpu cores? I thought (obviously
>>>>>>>>> I'm wrong) that it would be calculated in parallel and using a
>>>>>>>>> quadro-core (or more cores) would make storing faster.
>>>>>>>>>
>>>>>>>>> Is there a solution for that? Should I use another query-server?
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Julian
>>>>>>>>>
>>>
> 
>

Re: performance issues

Posted by Jan Lehnardt <ja...@apache.org>.

On 6 Apr 2010, at 16:45, Julian Moritz wrote:

> Hi Jan,
> 
> Jan Lehnardt schrieb:
>> On 6 Apr 2010, at 10:23, Julian Moritz wrote:
>> 
>>> Hi,
>>> 
>>> first of all: to make the keys of a view unique via reduce, i do it as
>>> it's desribed in the "definitive guide to couchdb".
>> 
>> try using "_sum" as your reduce function, it'll use an Erlang version that
>> will be faster than the JS version. "_sum" expects the emit() value to be
>> an integer. Just emit(doc.whatever, 1).
>> 
> 
> my reduce-function does a `return True` atm. Would it really be a speed-up?

Not using JavaScript will be faster than using JavaScript. Feel free to write your
views in Erlang completely :) "_sum" gives you a convenient middle-ground.

> I just did what you are writing on p 185 in "the definitive guide".

Yeah, I wrote that, but read on :)

Cheers
Jan
--

> 
> regards
> Julian
> 
>> Cheers
>> Jan
>> --
>> 
>>> J Chris Anderson schrieb:
>>>> On Apr 5, 2010, at 12:19 PM, Julian Moritz wrote:
>>>> 
>>>>> But I need it for making my list unique. This is an important feature
>>>>> for my application.
>>>> This is probably explains the slowness. When you do a group=true query, CouchDB has to run the reduce function once for each unique key (serializing all the rows in the key to the JS process, and parsing the results.)
>>>> 
>>>> I haven't tested this, but you might get better response throughput by dropping the reduce function and using a _list which only sends one row of output each time the key changes. This will avoid some additional Erlang processing of the result.
>>>> 
>>>> Some documentation for _list is here:
>>>> 
>>>> http://books.couchdb.org/relax/design-documents/lists
>>> Thank you, I will have a look on it.
>>> 
>>> Regards
>>> Julian
>>> 
>>>>> Thanks, I'll think about how to set up a couchdb cluster and do more
>>>>> testing.
>>>>> 
>>>>> Regards
>>>>> Julian
>>>>> 
>>>>>> Chris
>>>>>> 
>>>>>> 
>>>>>>> Regards
>>>>>>> Julian
>>>>>>> 
>>>>>>>> I've developed a (in my eyes) simple view. I have a wordlist which does
>>>>>>>> not  contain unique words. I want to store it in a view, with every word
>>>>>>>> occurring once and ordered by random. Therefore I have a simple view
>>>>>>>> function:
>>>>>>>> 
>>>>>>>> function(doc){
>>>>>>>> emit([hash(doc.word), doc.word], null);
>>>>>>>> }
>>>>>>>> 
>>>>>>>> and a simple reduce:
>>>>>>>> 
>>>>>>>> function(key, values, rereduce){
>>>>>>>> return true;
>>>>>>>> }
>>>>>>>> 
>>>>>>>> calling that view with group=true it does what I want.
>>>>>>>> 
>>>>>>>> When storing plenty of words to the database, one of my two cpu cores is
>>>>>>>> used completely by couchjs.
>>>>>>>> 
>>>>>>>> Isn't the view built using two (or all) cpu cores? I thought (obviously
>>>>>>>> I'm wrong) that it would be calculated in parallel and using a
>>>>>>>> quadro-core (or more cores) would make storing faster.
>>>>>>>> 
>>>>>>>> Is there a solution for that? Should I use another query-server?
>>>>>>>> 
>>>>>>>> Regards
>>>>>>>> Julian
>>>>>>>> 
>>>> 
>> 
>>

Re: performance issues

Posted by Julian Moritz <ma...@julianmoritz.de>.

Hi Jan,

Jan Lehnardt schrieb:
> On 6 Apr 2010, at 10:23, Julian Moritz wrote:
> 
>> Hi,
>>
>> first of all: to make the keys of a view unique via reduce, i do it as
>> it's desribed in the "definitive guide to couchdb".
> 
> try using "_sum" as your reduce function, it'll use an Erlang version that
> will be faster than the JS version. "_sum" expects the emit() value to be
> an integer. Just emit(doc.whatever, 1).
> 

my reduce-function does a `return True` atm. Would it really be a speed-up?

I just did what you are writing on p 185 in "the definitive guide".

regards
Julian

> Cheers
> Jan
> --
> 
>> J Chris Anderson schrieb:
>>> On Apr 5, 2010, at 12:19 PM, Julian Moritz wrote:
>>>
>>>> But I need it for making my list unique. This is an important feature
>>>> for my application.
>>> This is probably explains the slowness. When you do a group=true query, CouchDB has to run the reduce function once for each unique key (serializing all the rows in the key to the JS process, and parsing the results.)
>>>
>>> I haven't tested this, but you might get better response throughput by dropping the reduce function and using a _list which only sends one row of output each time the key changes. This will avoid some additional Erlang processing of the result.
>>>
>>> Some documentation for _list is here:
>>>
>>> http://books.couchdb.org/relax/design-documents/lists
>> Thank you, I will have a look on it.
>>
>> Regards
>> Julian
>>
>>>> Thanks, I'll think about how to set up a couchdb cluster and do more
>>>> testing.
>>>>
>>>> Regards
>>>> Julian
>>>>
>>>>> Chris
>>>>>
>>>>>
>>>>>> Regards
>>>>>> Julian
>>>>>>
>>>>>>> I've developed a (in my eyes) simple view. I have a wordlist which does
>>>>>>> not  contain unique words. I want to store it in a view, with every word
>>>>>>> occurring once and ordered by random. Therefore I have a simple view
>>>>>>> function:
>>>>>>>
>>>>>>> function(doc){
>>>>>>> emit([hash(doc.word), doc.word], null);
>>>>>>> }
>>>>>>>
>>>>>>> and a simple reduce:
>>>>>>>
>>>>>>> function(key, values, rereduce){
>>>>>>> return true;
>>>>>>> }
>>>>>>>
>>>>>>> calling that view with group=true it does what I want.
>>>>>>>
>>>>>>> When storing plenty of words to the database, one of my two cpu cores is
>>>>>>> used completely by couchjs.
>>>>>>>
>>>>>>> Isn't the view built using two (or all) cpu cores? I thought (obviously
>>>>>>> I'm wrong) that it would be calculated in parallel and using a
>>>>>>> quadro-core (or more cores) would make storing faster.
>>>>>>>
>>>>>>> Is there a solution for that? Should I use another query-server?
>>>>>>>
>>>>>>> Regards
>>>>>>> Julian
>>>>>>>
>>>
> 
>

Re: performance issues

Posted by Jan Lehnardt <ja...@apache.org>.

On 6 Apr 2010, at 10:23, Julian Moritz wrote:

> Hi,
> 
> first of all: to make the keys of a view unique via reduce, i do it as
> it's desribed in the "definitive guide to couchdb".

try using "_sum" as your reduce function, it'll use an Erlang version that
will be faster than the JS version. "_sum" expects the emit() value to be
an integer. Just emit(doc.whatever, 1).

Cheers
Jan
--

> 
> J Chris Anderson schrieb:
>> On Apr 5, 2010, at 12:19 PM, Julian Moritz wrote:
>> 
>>> But I need it for making my list unique. This is an important feature
>>> for my application.
>> 
>> This is probably explains the slowness. When you do a group=true query, CouchDB has to run the reduce function once for each unique key (serializing all the rows in the key to the JS process, and parsing the results.)
>> 
>> I haven't tested this, but you might get better response throughput by dropping the reduce function and using a _list which only sends one row of output each time the key changes. This will avoid some additional Erlang processing of the result.
>> 
>> Some documentation for _list is here:
>> 
>> http://books.couchdb.org/relax/design-documents/lists
> 
> Thank you, I will have a look on it.
> 
> Regards
> Julian
> 
>> 
>>> Thanks, I'll think about how to set up a couchdb cluster and do more
>>> testing.
>>> 
>>> Regards
>>> Julian
>>> 
>>>> Chris
>>>> 
>>>> 
>>>>> Regards
>>>>> Julian
>>>>> 
>>>>>> I've developed a (in my eyes) simple view. I have a wordlist which does
>>>>>> not  contain unique words. I want to store it in a view, with every word
>>>>>> occurring once and ordered by random. Therefore I have a simple view
>>>>>> function:
>>>>>> 
>>>>>> function(doc){
>>>>>> emit([hash(doc.word), doc.word], null);
>>>>>> }
>>>>>> 
>>>>>> and a simple reduce:
>>>>>> 
>>>>>> function(key, values, rereduce){
>>>>>> return true;
>>>>>> }
>>>>>> 
>>>>>> calling that view with group=true it does what I want.
>>>>>> 
>>>>>> When storing plenty of words to the database, one of my two cpu cores is
>>>>>> used completely by couchjs.
>>>>>> 
>>>>>> Isn't the view built using two (or all) cpu cores? I thought (obviously
>>>>>> I'm wrong) that it would be calculated in parallel and using a
>>>>>> quadro-core (or more cores) would make storing faster.
>>>>>> 
>>>>>> Is there a solution for that? Should I use another query-server?
>>>>>> 
>>>>>> Regards
>>>>>> Julian
>>>>>> 
>>>> 
>> 
>>

Re: performance issues

Posted by Julian Moritz <ma...@julianmoritz.de>.

Hi,

first of all: to make the keys of a view unique via reduce, i do it as
it's desribed in the "definitive guide to couchdb".

J Chris Anderson schrieb:
> On Apr 5, 2010, at 12:19 PM, Julian Moritz wrote:
> 
>> But I need it for making my list unique. This is an important feature
>> for my application.
> 
> This is probably explains the slowness. When you do a group=true query, CouchDB has to run the reduce function once for each unique key (serializing all the rows in the key to the JS process, and parsing the results.)
> 
> I haven't tested this, but you might get better response throughput by dropping the reduce function and using a _list which only sends one row of output each time the key changes. This will avoid some additional Erlang processing of the result.
> 
> Some documentation for _list is here:
> 
> http://books.couchdb.org/relax/design-documents/lists

Thank you, I will have a look on it.

Regards
Julian

> 
>> Thanks, I'll think about how to set up a couchdb cluster and do more
>> testing.
>>
>> Regards
>> Julian
>>
>>> Chris
>>>
>>>
>>>> Regards
>>>> Julian
>>>>
>>>>> I've developed a (in my eyes) simple view. I have a wordlist which does
>>>>> not  contain unique words. I want to store it in a view, with every word
>>>>> occurring once and ordered by random. Therefore I have a simple view
>>>>> function:
>>>>>
>>>>> function(doc){
>>>>> emit([hash(doc.word), doc.word], null);
>>>>> }
>>>>>
>>>>> and a simple reduce:
>>>>>
>>>>> function(key, values, rereduce){
>>>>> return true;
>>>>> }
>>>>>
>>>>> calling that view with group=true it does what I want.
>>>>>
>>>>> When storing plenty of words to the database, one of my two cpu cores is
>>>>> used completely by couchjs.
>>>>>
>>>>> Isn't the view built using two (or all) cpu cores? I thought (obviously
>>>>> I'm wrong) that it would be calculated in parallel and using a
>>>>> quadro-core (or more cores) would make storing faster.
>>>>>
>>>>> Is there a solution for that? Should I use another query-server?
>>>>>
>>>>> Regards
>>>>> Julian
>>>>>
>>>
> 
>

Re: performance issues

Posted by J Chris Anderson <jc...@gmail.com>.

On Apr 5, 2010, at 12:19 PM, Julian Moritz wrote:

> Hi J Chris,
> 
> J Chris Anderson schrieb:
>> On Apr 5, 2010, at 11:52 AM, Julian Moritz wrote:
>> 
>>> Hi,
>>> 
>>> Julian Moritz schrieb:
>>>> Hi,
>>>> 
>>> I've just found this via google:
>>> 
>>>>> We don't parallelize view index creation yet, so this is not an
>>>>> additional problem for you. You can however build two views in
>>>>> parallel and make use of two cores that way.
>>> If this is (still) true, view index creation is the bottleneck of my
>>> application. Hence I'm just playing around and yet using 100% of my
>>> core, I cannot use CouchDB anymore.
>>> 
>> 
>> We rarely see view generation that is actually limited by view-function execution speed. The majority of the time the actual bottleneck is disk IO. To parallelize view generation the best option is to run a CouchDB-Lounge cluster.
>> 
> 
> Hm, at the moment I have access to two computers. This isn't what you
> mean with a "couchdb-lounge cluster", right?
> 
>> It looks like you might be better of removing your reduce function, which might also speed things up.
>> 
> 
> But I need it for making my list unique. This is an important feature
> for my application.

This is probably explains the slowness. When you do a group=true query, CouchDB has to run the reduce function once for each unique key (serializing all the rows in the key to the JS process, and parsing the results.)

I haven't tested this, but you might get better response throughput by dropping the reduce function and using a _list which only sends one row of output each time the key changes. This will avoid some additional Erlang processing of the result.

Some documentation for _list is here:

http://books.couchdb.org/relax/design-documents/lists

> 
> Thanks, I'll think about how to set up a couchdb cluster and do more
> testing.
> 
> Regards
> Julian
> 
>> Chris
>> 
>> 
>>> Regards
>>> Julian
>>> 
>>>> I've developed a (in my eyes) simple view. I have a wordlist which does
>>>> not  contain unique words. I want to store it in a view, with every word
>>>> occurring once and ordered by random. Therefore I have a simple view
>>>> function:
>>>> 
>>>> function(doc){
>>>> emit([hash(doc.word), doc.word], null);
>>>> }
>>>> 
>>>> and a simple reduce:
>>>> 
>>>> function(key, values, rereduce){
>>>> return true;
>>>> }
>>>> 
>>>> calling that view with group=true it does what I want.
>>>> 
>>>> When storing plenty of words to the database, one of my two cpu cores is
>>>> used completely by couchjs.
>>>> 
>>>> Isn't the view built using two (or all) cpu cores? I thought (obviously
>>>> I'm wrong) that it would be calculated in parallel and using a
>>>> quadro-core (or more cores) would make storing faster.
>>>> 
>>>> Is there a solution for that? Should I use another query-server?
>>>> 
>>>> Regards
>>>> Julian
>>>> 
>> 
>>

Re: performance issues

Posted by Julian Moritz <ma...@julianmoritz.de>.

Hi J Chris,

J Chris Anderson schrieb:
> On Apr 5, 2010, at 11:52 AM, Julian Moritz wrote:
> 
>> Hi,
>>
>> Julian Moritz schrieb:
>>> Hi,
>>>
>> I've just found this via google:
>>
>>>> We don't parallelize view index creation yet, so this is not an
>>>> additional problem for you. You can however build two views in
>>>> parallel and make use of two cores that way.
>> If this is (still) true, view index creation is the bottleneck of my
>> application. Hence I'm just playing around and yet using 100% of my
>> core, I cannot use CouchDB anymore.
>>
> 
> We rarely see view generation that is actually limited by view-function execution speed. The majority of the time the actual bottleneck is disk IO. To parallelize view generation the best option is to run a CouchDB-Lounge cluster.
> 

Hm, at the moment I have access to two computers. This isn't what you
mean with a "couchdb-lounge cluster", right?

> It looks like you might be better of removing your reduce function, which might also speed things up.
> 

But I need it for making my list unique. This is an important feature
for my application.

Thanks, I'll think about how to set up a couchdb cluster and do more
testing.

Regards
Julian

> Chris
> 
> 
>> Regards
>> Julian
>>
>>> I've developed a (in my eyes) simple view. I have a wordlist which does
>>> not  contain unique words. I want to store it in a view, with every word
>>> occurring once and ordered by random. Therefore I have a simple view
>>> function:
>>>
>>> function(doc){
>>> emit([hash(doc.word), doc.word], null);
>>> }
>>>
>>> and a simple reduce:
>>>
>>> function(key, values, rereduce){
>>> return true;
>>> }
>>>
>>> calling that view with group=true it does what I want.
>>>
>>> When storing plenty of words to the database, one of my two cpu cores is
>>> used completely by couchjs.
>>>
>>> Isn't the view built using two (or all) cpu cores? I thought (obviously
>>> I'm wrong) that it would be calculated in parallel and using a
>>> quadro-core (or more cores) would make storing faster.
>>>
>>> Is there a solution for that? Should I use another query-server?
>>>
>>> Regards
>>> Julian
>>>
> 
>

Re: performance issues

Posted by J Chris Anderson <jc...@gmail.com>.

On Apr 5, 2010, at 11:52 AM, Julian Moritz wrote:

> Hi,
> 
> Julian Moritz schrieb:
>> Hi,
>> 
> 
> I've just found this via google:
> 
>>> We don't parallelize view index creation yet, so this is not an
>>> additional problem for you. You can however build two views in
>>> parallel and make use of two cores that way.
>> 
> 
> If this is (still) true, view index creation is the bottleneck of my
> application. Hence I'm just playing around and yet using 100% of my
> core, I cannot use CouchDB anymore.
> 

We rarely see view generation that is actually limited by view-function execution speed. The majority of the time the actual bottleneck is disk IO. To parallelize view generation the best option is to run a CouchDB-Lounge cluster.

It looks like you might be better of removing your reduce function, which might also speed things up.

Chris


> Regards
> Julian
> 
>> I've developed a (in my eyes) simple view. I have a wordlist which does
>> not  contain unique words. I want to store it in a view, with every word
>> occurring once and ordered by random. Therefore I have a simple view
>> function:
>> 
>> function(doc){
>> emit([hash(doc.word), doc.word], null);
>> }
>> 
>> and a simple reduce:
>> 
>> function(key, values, rereduce){
>> return true;
>> }
>> 
>> calling that view with group=true it does what I want.
>> 
>> When storing plenty of words to the database, one of my two cpu cores is
>> used completely by couchjs.
>> 
>> Isn't the view built using two (or all) cpu cores? I thought (obviously
>> I'm wrong) that it would be calculated in parallel and using a
>> quadro-core (or more cores) would make storing faster.
>> 
>> Is there a solution for that? Should I use another query-server?
>> 
>> Regards
>> Julian
>>

Re: performance issues

Posted by Julian Moritz <ma...@julianmoritz.de>.

Hi,

Julian Moritz schrieb:
> Hi,
> 

I've just found this via google:

>> We don't parallelize view index creation yet, so this is not an
>> additional problem for you. You can however build two views in
>> parallel and make use of two cores that way.
>

If this is (still) true, view index creation is the bottleneck of my
application. Hence I'm just playing around and yet using 100% of my
core, I cannot use CouchDB anymore.

Regards
Julian

> I've developed a (in my eyes) simple view. I have a wordlist which does
> not  contain unique words. I want to store it in a view, with every word
> occurring once and ordered by random. Therefore I have a simple view
> function:
> 
> function(doc){
> emit([hash(doc.word), doc.word], null);
> }
> 
> and a simple reduce:
> 
> function(key, values, rereduce){
> return true;
> }
> 
> calling that view with group=true it does what I want.
> 
> When storing plenty of words to the database, one of my two cpu cores is
> used completely by couchjs.
> 
> Isn't the view built using two (or all) cpu cores? I thought (obviously
> I'm wrong) that it would be calculated in parallel and using a
> quadro-core (or more cores) would make storing faster.
> 
> Is there a solution for that? Should I use another query-server?
> 
> Regards
> Julian
>