You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Thomas Hommers <th...@ebalu.com> on 2011/10/21 06:06:20 UTC

Doc design / performace

Hi,

i am quite new to couchDB and trying to build a sales application.

I designed a document as product. One product consist of multiple sub-products that are unique to one product.
Next i designed a sales document that consists of multiple products. The quantity of each sub-product can be chosen independent.

When i know want to see the total sales quantity, i created a view that runs through all sales-docs and emits the sold quantity, with the product- and sub-product-number as keys. This way I am able to see the sold quantity by product and by sub-product with a reduce function.

The problem i am facing is that it takes a long time to display an overview of all quantities.
Did i maybe design something wrong and should take another approach? e.g. maybe I should create a doc for each sub-product instead of having them all in one product-doc? Would this be faster?

I am really thankful for any advice, hint or comment.

Regards
Thomas

Re: Doc design / performace

Posted by Jan Lehnardt <ja...@apache.org>.
On Oct 21, 2011, at 06:06 , Thomas Hommers wrote:

> Hi,
> 
> i am quite new to couchDB and trying to build a sales application.
> 
> I designed a document as product. One product consist of multiple sub-products that are unique to one product.
> Next i designed a sales document that consists of multiple products. The quantity of each sub-product can be chosen independent.
> 
> When i know want to see the total sales quantity, i created a view that runs through all sales-docs and emits the sold quantity, with the product- and sub-product-number as keys. This way I am able to see the sold quantity by product and by sub-product with a reduce function.
> 
> The problem i am facing is that it takes a long time to display an overview of all quantities.
> Did i maybe design something wrong and should take another approach? e.g. maybe I should create a doc for each sub-product instead of having them all in one product-doc? Would this be faster?
> 
> I am really thankful for any advice, hint or comment.

Can you give us some numbers? And show us some code? :)

Cheers
Jan
-- 





Re: Doc design / performace

Posted by Robert Newson <rn...@apache.org>.
Hi Thomas,

All the views in a design document are built at the same time and
written to the same file. So one big difference between having one doc
versus several is that having several allows views to be built in
parallel. Depending on your server hardware this could be a very
significant difference. Another is that modifying one view in a design
document will invalidate all views in that document (they will all be
rebuilt).

The execution time of the map function is rarely the bottleneck, it's
dominated by the I/O of reading and writing.

HTH,
B.

On 27 October 2011 16:38, CGS <cg...@gmail.com> wrote:
> At point 3:
>
> As long as you name each view, there shouldn't be any difference in between
> creating more design documents and having one design document (maybe having
> a single design document would help in limiting the number of documents in a
> database). That's because the access is based on path and the view build is
> done at the time of the first request (updated at the database change).
>
> A more important thing is to optimize your views by reducing the number of
> operations in it to exactly what you need. Don't forget that the view
> functions are applied to each document from the database. So, defining more
> views (even in the same document) with more specific tasks can speed up the
> build operations (considering you don't use them recursively for the same
> operation because that would slow down the overall process). In the case of
> needing more than one dedicated task for one operation, you can define a
> multiple choice view (e.g., one view with if else statement for simple
> conditions).
>
> Other than these, I don't know any recommendation.
>
> CGS
>
>
>
>
> On 10/27/2011 12:23 PM, Thomas Hommers wrote:
>>
>> Hi,
>>
>> thanks for all the advise.
>> I know my question might have been a bit "general", but i think some of
>> the therefore "general" advise i received lead me to the right direction.
>>
>> About the performace:
>>
>> 1. after some research i found in the logs an error "127" that had
>> something to do with the spidermonkey installation. After solving this the
>> speed increased immediately.
>>
>> 2. I will try to split the data into different databases.
>>
>> 3. I maybe should separate some views into their own design documents. Is
>> there any best practice when to do that and what views should stay in one
>> design-doc and what views should have their own doc?
>>
>> 4. I used the build-in _sum in the reduce function and think this
>> increased the speed too.
>>
>> 5. I need to look into python-couchdb, as i just read there is also a bug
>> that causes speed issues.
>>
>> Regards
>> Thomas
>> ________________________________________
>> From: Dave Cottlehuber [dave@muse.net.nz]
>> Sent: Friday, October 21, 2011 6:38 PM
>> To: user@couchdb.apache.org
>> Subject: Doc design / performace
>>
>> On 21 October 2011 06:06, Thomas Hommers<th...@ebalu.com>  wrote:
>>>
>>> Hi,
>>>
>>> i am quite new to couchDB and trying to build a sales application.
>>
>> Welcome!
>>
>>> I designed a document as product. One product consist of multiple
>>
>> sub-products that are unique to one product.
>>>
>>> Next i designed a sales document that consists of multiple products. The
>>
>> quantity of each sub-product can be chosen independent.
>>
>> Does 1 doc = 1 product incl all sub-products? or are there references
>> within
>> 1 doc to another, .e.g to use via include_docs=true? Perhaps you can post
>> a
>> link to a few sample docs for us to look at.
>>
>>> When i know want to see the total sales quantity, i created a view that
>>
>> runs through all sales-docs and emits the sold quantity, with the product-
>> and sub-product-number as keys. This way I am able to see the sold
>> quantity
>> by product and by sub-product with a reduce function.
>>
>> Ditto. This seems reasonable, and given that once the view is built, the
>> intermediate values are cached in the view file, this shouldn't be too
>> slow.
>>
>>> The problem i am facing is that it takes a long time to display an
>>
>> overview of all quantities.
>>
>> What is your expectation vs what you saw? How are you querying the view?
>>
>>> Did i maybe design something wrong and should take another approach? e.g.
>>
>> maybe I should create a doc for each sub-product instead of having them
>> all
>> in one product-doc? Would this be faster?
>>
>>> I am really thankful for any advice, hint or comment.
>>>
>>> Regards
>>> Thomas
>>
>> Generally,
>
>

Re: Doc design / performace

Posted by CGS <cg...@gmail.com>.
At point 3:

As long as you name each view, there shouldn't be any difference in 
between creating more design documents and having one design document 
(maybe having a single design document would help in limiting the number 
of documents in a database). That's because the access is based on path 
and the view build is done at the time of the first request (updated at 
the database change).

A more important thing is to optimize your views by reducing the number 
of operations in it to exactly what you need. Don't forget that the view 
functions are applied to each document from the database. So, defining 
more views (even in the same document) with more specific tasks can 
speed up the build operations (considering you don't use them 
recursively for the same operation because that would slow down the 
overall process). In the case of needing more than one dedicated task 
for one operation, you can define a multiple choice view (e.g., one view 
with if else statement for simple conditions).

Other than these, I don't know any recommendation.

CGS




On 10/27/2011 12:23 PM, Thomas Hommers wrote:
> Hi,
>
> thanks for all the advise.
> I know my question might have been a bit "general", but i think some of the therefore "general" advise i received lead me to the right direction.
>
> About the performace:
>
> 1. after some research i found in the logs an error "127" that had something to do with the spidermonkey installation. After solving this the speed increased immediately.
>
> 2. I will try to split the data into different databases.
>
> 3. I maybe should separate some views into their own design documents. Is there any best practice when to do that and what views should stay in one design-doc and what views should have their own doc?
>
> 4. I used the build-in _sum in the reduce function and think this increased the speed too.
>
> 5. I need to look into python-couchdb, as i just read there is also a bug that causes speed issues.
>
> Regards
> Thomas
> ________________________________________
> From: Dave Cottlehuber [dave@muse.net.nz]
> Sent: Friday, October 21, 2011 6:38 PM
> To: user@couchdb.apache.org
> Subject: Doc design / performace
>
> On 21 October 2011 06:06, Thomas Hommers<th...@ebalu.com>  wrote:
>> Hi,
>>
>> i am quite new to couchDB and trying to build a sales application.
> Welcome!
>
>> I designed a document as product. One product consist of multiple
> sub-products that are unique to one product.
>> Next i designed a sales document that consists of multiple products. The
> quantity of each sub-product can be chosen independent.
>
> Does 1 doc = 1 product incl all sub-products? or are there references within
> 1 doc to another, .e.g to use via include_docs=true? Perhaps you can post a
> link to a few sample docs for us to look at.
>
>> When i know want to see the total sales quantity, i created a view that
> runs through all sales-docs and emits the sold quantity, with the product-
> and sub-product-number as keys. This way I am able to see the sold quantity
> by product and by sub-product with a reduce function.
>
> Ditto. This seems reasonable, and given that once the view is built, the
> intermediate values are cached in the view file, this shouldn't be too slow.
>
>> The problem i am facing is that it takes a long time to display an
> overview of all quantities.
>
> What is your expectation vs what you saw? How are you querying the view?
>
>> Did i maybe design something wrong and should take another approach? e.g.
> maybe I should create a doc for each sub-product instead of having them all
> in one product-doc? Would this be faster?
>
>> I am really thankful for any advice, hint or comment.
>>
>> Regards
>> Thomas
> Generally,


RE: Doc design / performace

Posted by Thomas Hommers <th...@ebalu.com>.
Hi,

thanks for all the advise.
I know my question might have been a bit "general", but i think some of the therefore "general" advise i received lead me to the right direction.

About the performace:

1. after some research i found in the logs an error "127" that had something to do with the spidermonkey installation. After solving this the speed increased immediately.

2. I will try to split the data into different databases.

3. I maybe should separate some views into their own design documents. Is there any best practice when to do that and what views should stay in one design-doc and what views should have their own doc?

4. I used the build-in _sum in the reduce function and think this increased the speed too.

5. I need to look into python-couchdb, as i just read there is also a bug that causes speed issues.

Regards
Thomas
________________________________________
From: Dave Cottlehuber [dave@muse.net.nz]
Sent: Friday, October 21, 2011 6:38 PM
To: user@couchdb.apache.org
Subject: Doc design / performace

On 21 October 2011 06:06, Thomas Hommers <th...@ebalu.com> wrote:
> Hi,
>
> i am quite new to couchDB and trying to build a sales application.

Welcome!

> I designed a document as product. One product consist of multiple
sub-products that are unique to one product.
> Next i designed a sales document that consists of multiple products. The
quantity of each sub-product can be chosen independent.

Does 1 doc = 1 product incl all sub-products? or are there references within
1 doc to another, .e.g to use via include_docs=true? Perhaps you can post a
link to a few sample docs for us to look at.

> When i know want to see the total sales quantity, i created a view that
runs through all sales-docs and emits the sold quantity, with the product-
and sub-product-number as keys. This way I am able to see the sold quantity
by product and by sub-product with a reduce function.

Ditto. This seems reasonable, and given that once the view is built, the
intermediate values are cached in the view file, this shouldn't be too slow.

> The problem i am facing is that it takes a long time to display an
overview of all quantities.

What is your expectation vs what you saw? How are you querying the view?

> Did i maybe design something wrong and should take another approach? e.g.
maybe I should create a doc for each sub-product instead of having them all
in one product-doc? Would this be faster?

> I am really thankful for any advice, hint or comment.
>
> Regards
> Thomas

Generally,

Doc design / performace

Posted by Dave Cottlehuber <da...@muse.net.nz>.
On 21 October 2011 06:06, Thomas Hommers <th...@ebalu.com> wrote:
> Hi,
>
> i am quite new to couchDB and trying to build a sales application.

Welcome!

> I designed a document as product. One product consist of multiple
sub-products that are unique to one product.
> Next i designed a sales document that consists of multiple products. The
quantity of each sub-product can be chosen independent.

Does 1 doc = 1 product incl all sub-products? or are there references within
1 doc to another, .e.g to use via include_docs=true? Perhaps you can post a
link to a few sample docs for us to look at.

> When i know want to see the total sales quantity, i created a view that
runs through all sales-docs and emits the sold quantity, with the product-
and sub-product-number as keys. This way I am able to see the sold quantity
by product and by sub-product with a reduce function.

Ditto. This seems reasonable, and given that once the view is built, the
intermediate values are cached in the view file, this shouldn't be too slow.

> The problem i am facing is that it takes a long time to display an
overview of all quantities.

What is your expectation vs what you saw? How are you querying the view?

> Did i maybe design something wrong and should take another approach? e.g.
maybe I should create a doc for each sub-product instead of having them all
in one product-doc? Would this be faster?

> I am really thankful for any advice, hint or comment.
>
> Regards
> Thomas

Generally,

Re: Doc design / performace

Posted by CGS <cg...@gmail.com>.
I do tend to agree with Frederick on this one. "I don't know" seems to 
be the real answer here.

Nevertheless, he touched some valuable points in general design which, 
in your case, would be concluded in "divide et impera" (if I understood 
correctly your problem). That means:
1. at the database level, try to achieve better granularity (by 
designing smaller databases with as optimal number of documents as 
possible);
2. at the view level, use pagination (10 subproducts per page would 
allow enough time to build all the views until the user hits the button 
for the next page).

Keep in mind that if you don't find a way, you who knows the best your 
project requirements, nobody can find it for you unless that person is 
really into your project.


On 10/21/2011 08:29 AM, Frederick Dalgleish wrote:
> The real answer I have for you is "I don't know."
>
> The other answer is a bunch of generalities, some of which may be even true, all of which you
> probably have already considered....but if not, here goes.  All of this is worth what you paid for it....the advice that is.
>
> The number one rule of design is to make the granularity of your design such that each document is about
> equal to an object in an object oriented programming language, let's say, Objective C for kicks and grins.
>
> So an object might be a dog.  A dog is a product in your schema.  A dog or a product might have characteristics, like objects do.
> Dogs have sizes, colors and tail lengths.  Products have prices, quantities and maybe colors or something.
> The product is a document.  The fields are the characteristics.
>
> The second rule of design is that a million items in CB won't process as fast as say, 10 items.  So feel free to have a bunch of
> databases with fewer numbers of objects in them.  Consider scaling up to more servers.  Consider getting better advice than mine.
> That won't be difficult.
>
> The third rule of design is to remember the machine and/or the CB product you are using.  A CB product which caches some of
> the more frequently used or frequently somethinged items in RAM, rather than forcing all the fun to and from the disk which is spinning
> as fast as that poor disk is able, trying to keep up with your app....will be faster than otherwise.  So consider Membase related products.
>
> The fourth rule of design has nothing to do with design.  It isn't really a rule.  It just says that things go faster when the machine has more
> cycles per second, more processors, higher bus speeds, higher bandwidth to and from the server, more RAM, and a billion other little things.
> All the external to the server stuff can be ruled out if you are using localhost, probably.
>
> So, if your machine is the cat's meow and your budget for software and CB help instances is limitless, you are certain to figure this out.
>
> Cheers.  FD
>
> PS  If you get an answer to your question that really rocks, please consider sharing it with me (us).
>
>
>
>
>
>
> On Oct 21, 2011, at 12:06 AM, Thomas Hommers wrote:
>
>> Hi,
>>
>> i am quite new to couchDB and trying to build a sales application.
>>
>> I designed a document as product. One product consist of multiple sub-products that are unique to one product.
>> Next i designed a sales document that consists of multiple products. The quantity of each sub-product can be chosen independent.
>>
>> When i know want to see the total sales quantity, i created a view that runs through all sales-docs and emits the sold quantity, with the product- and sub-product-number as keys. This way I am able to see the sold quantity by product and by sub-product with a reduce function.
>>
>> The problem i am facing is that it takes a long time to display an overview of all quantities.
>> Did i maybe design something wrong and should take another approach? e.g. maybe I should create a doc for each sub-product instead of having them all in one product-doc? Would this be faster?
>>
>> I am really thankful for any advice, hint or comment.
>>
>> Regards
>> Thomas


Re: Doc design / performace

Posted by Frederick Dalgleish <da...@gmail.com>.
The real answer I have for you is "I don't know."

The other answer is a bunch of generalities, some of which may be even true, all of which you
probably have already considered....but if not, here goes.  All of this is worth what you paid for it....the advice that is.

The number one rule of design is to make the granularity of your design such that each document is about
equal to an object in an object oriented programming language, let's say, Objective C for kicks and grins.

So an object might be a dog.  A dog is a product in your schema.  A dog or a product might have characteristics, like objects do.
Dogs have sizes, colors and tail lengths.  Products have prices, quantities and maybe colors or something.
The product is a document.  The fields are the characteristics.

The second rule of design is that a million items in CB won't process as fast as say, 10 items.  So feel free to have a bunch of
databases with fewer numbers of objects in them.  Consider scaling up to more servers.  Consider getting better advice than mine.
That won't be difficult.

The third rule of design is to remember the machine and/or the CB product you are using.  A CB product which caches some of
the more frequently used or frequently somethinged items in RAM, rather than forcing all the fun to and from the disk which is spinning
as fast as that poor disk is able, trying to keep up with your app....will be faster than otherwise.  So consider Membase related products.

The fourth rule of design has nothing to do with design.  It isn't really a rule.  It just says that things go faster when the machine has more
cycles per second, more processors, higher bus speeds, higher bandwidth to and from the server, more RAM, and a billion other little things.
All the external to the server stuff can be ruled out if you are using localhost, probably.

So, if your machine is the cat's meow and your budget for software and CB help instances is limitless, you are certain to figure this out.

Cheers.  FD

PS  If you get an answer to your question that really rocks, please consider sharing it with me (us).






On Oct 21, 2011, at 12:06 AM, Thomas Hommers wrote:

> Hi,
> 
> i am quite new to couchDB and trying to build a sales application.
> 
> I designed a document as product. One product consist of multiple sub-products that are unique to one product.
> Next i designed a sales document that consists of multiple products. The quantity of each sub-product can be chosen independent.
> 
> When i know want to see the total sales quantity, i created a view that runs through all sales-docs and emits the sold quantity, with the product- and sub-product-number as keys. This way I am able to see the sold quantity by product and by sub-product with a reduce function.
> 
> The problem i am facing is that it takes a long time to display an overview of all quantities.
> Did i maybe design something wrong and should take another approach? e.g. maybe I should create a doc for each sub-product instead of having them all in one product-doc? Would this be faster?
> 
> I am really thankful for any advice, hint or comment.
> 
> Regards
> Thomas


Re: Doc design / performace

Posted by bsquared <bw...@gmail.com>.
Thomas Hommers <th...@ebalu.com>
writes:

> Hi,
>
> i am quite new to couchDB and trying to build a sales application.
>
> I designed a document as product. One product consist of multiple sub-products that are unique to one product.
> Next i designed a sales document that consists of multiple products. The quantity of each sub-product can be chosen independent.
>
> When i know want to see the total sales quantity, i created a view that runs through all sales-docs and emits the sold quantity, with the product- and sub-product-number as keys. This way I am able to see the sold quantity by product and by sub-product with a reduce function.
>
> The problem i am facing is that it takes a long time to display an overview of all quantities.
> Did i maybe design something wrong and should take another approach? e.g. maybe I should create a doc for each sub-product instead of having them all in one product-doc? Would this be faster?
>
> I am really thankful for any advice, hint or comment.
>
> Regards
> Thomas

Recently there was a post to this group about using s3 paths. That may
be of some interest to you.


I'm sorry that I don't have a direct link for you.
-- 
Regards,
Brian Winfrey