You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Chris Hicks <si...@hotmail.com> on 2010/05/08 00:03:23 UTC

Best views performance

If I have a DB of lets say 100K+ documents, with each document having 20 fields, would it be more efficient to have a view that has indexed each document and simply run more complex queries over them or would it be better to have a number of smaller views, each covering only a few fields, and running much simpler queries on each of these views for whatever data might be needed at that moment. As with most things the answer is probably "it depends." If there is no easy answer could anyone tell me what the pros and cons are to each of these approaches?
Chris Hicks 		 	   		  
_________________________________________________________________
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1

RE: Best views performance

Posted by Chris Hicks <si...@hotmail.com>.
Even though they all get calculated at once, do the different views in one design doc each have their own separate data structure (Just want to make sure I understand this part fully) despite being calculated at once? Want to know the implications of one or multiple design docs for a single DB. As far as updating the views, since I need all the data to be current for successive reads ASAP, would it be viable to have the HTTP handler call for an update on the index or, for some reason, would that delay the response that says the operation was successful/failed?
Chris Hicks

> From: simonmetson@googlemail.com
> To: user@couchdb.apache.org
> Subject: Re: Best views performance
> Date: Sat, 8 May 2010 11:20:58 -0500
> 
> Glad it helped! The other thing to consider is having 1 design  
> document (which means all views in that ddoc gets calculated at once -  
> better IO if I understand right) or splitting "hot" views from colder  
> ones, to only index what's needed. Again, depends on your use case/ 
> access patterns.
> 
> The other thing to consider is always querying the views with ? 
> stale=ok and having some other process update the views. This means  
> your clients will always get a fast response, and you can trigger the  
> view indexing when appropriate.
> Cheers
> Simon
> 
> On 8 May 2010, at 10:35, Chris Hicks wrote:
> 
> >
> > That does help. I am looking for low client side access times and  
> > quick updating of the indexes as in many cases there will be many  
> > rapid-fire changes to certain documents and I need the index to be  
> > as up to date as possible. All The other concerns, while always  
> > important to keep in mind, are lower priority for me for this  
> > project. I think going the same route as you did is best, break  
> > everything down into really small bits and only grab the specific  
> > data needed from a smaller index. Thanks for the reply Simon.
> > Chris Hicks
> >
> >> From: simonmetson@googlemail.com
> >> To: user@couchdb.apache.org
> >> Subject: Re: Best views performance
> >> Date: Sat, 8 May 2010 07:56:42 -0500
> >>
> >> Hi,
> >> 	I think it depends on what you are willing to trade. More indexes  
> >> can
> >> mean using up more disk space, and potentially longer calculation
> >> time, but more efficient access on the client side (e.g. I'd get the
> >> two fields I wanted instead of 20). So it depends on your  
> >> application:
> >> do you pay through the nose for storage, do you expect a lot of
> >> updates (and hence a lot of view indexing) or are things fairly
> >> static, do you have 1 client process of millions....
> >> 	I tend to try and have a view that indexes one piece of information
> >> and then reuse that as much as possible with grouping, reduce=false,
> >> include_docs etc. I'll only add a view when I know that one of my
> >> existing ones can't be bent to my will. Not sure if that's the best
> >> way to go so YMMV...
> >> Cheers
> >> Simon
> >>
> >> On 7 May 2010, at 17:03, Chris Hicks wrote:
> >>
> >>>
> >>> If I have a DB of lets say 100K+ documents, with each document
> >>> having 20 fields, would it be more efficient to have a view that has
> >>> indexed each document and simply run more complex queries over them
> >>> or would it be better to have a number of smaller views, each
> >>> covering only a few fields, and running much simpler queries on each
> >>> of these views for whatever data might be needed at that moment. As
> >>> with most things the answer is probably "it depends." If there is no
> >>> easy answer could anyone tell me what the pros and cons are to each
> >>> of these approaches?
> >>> Chris Hicks 		 	   		
> >>> _________________________________________________________________
> >>> Hotmail has tools for the New Busy. Search, chat and e-mail from
> >>> your inbox.
> >>> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
> >>
> > 		 	   		
> > _________________________________________________________________
> > The New Busy think 9 to 5 is a cute idea. Combine multiple calendars  
> > with Hotmail.
> > http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5
> 
 		 	   		  
_________________________________________________________________
The New Busy is not the old busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3

Re: Best views performance

Posted by Simon Metson <si...@googlemail.com>.
Glad it helped! The other thing to consider is having 1 design  
document (which means all views in that ddoc gets calculated at once -  
better IO if I understand right) or splitting "hot" views from colder  
ones, to only index what's needed. Again, depends on your use case/ 
access patterns.

The other thing to consider is always querying the views with ? 
stale=ok and having some other process update the views. This means  
your clients will always get a fast response, and you can trigger the  
view indexing when appropriate.
Cheers
Simon

On 8 May 2010, at 10:35, Chris Hicks wrote:

>
> That does help. I am looking for low client side access times and  
> quick updating of the indexes as in many cases there will be many  
> rapid-fire changes to certain documents and I need the index to be  
> as up to date as possible. All The other concerns, while always  
> important to keep in mind, are lower priority for me for this  
> project. I think going the same route as you did is best, break  
> everything down into really small bits and only grab the specific  
> data needed from a smaller index. Thanks for the reply Simon.
> Chris Hicks
>
>> From: simonmetson@googlemail.com
>> To: user@couchdb.apache.org
>> Subject: Re: Best views performance
>> Date: Sat, 8 May 2010 07:56:42 -0500
>>
>> Hi,
>> 	I think it depends on what you are willing to trade. More indexes  
>> can
>> mean using up more disk space, and potentially longer calculation
>> time, but more efficient access on the client side (e.g. I'd get the
>> two fields I wanted instead of 20). So it depends on your  
>> application:
>> do you pay through the nose for storage, do you expect a lot of
>> updates (and hence a lot of view indexing) or are things fairly
>> static, do you have 1 client process of millions....
>> 	I tend to try and have a view that indexes one piece of information
>> and then reuse that as much as possible with grouping, reduce=false,
>> include_docs etc. I'll only add a view when I know that one of my
>> existing ones can't be bent to my will. Not sure if that's the best
>> way to go so YMMV...
>> Cheers
>> Simon
>>
>> On 7 May 2010, at 17:03, Chris Hicks wrote:
>>
>>>
>>> If I have a DB of lets say 100K+ documents, with each document
>>> having 20 fields, would it be more efficient to have a view that has
>>> indexed each document and simply run more complex queries over them
>>> or would it be better to have a number of smaller views, each
>>> covering only a few fields, and running much simpler queries on each
>>> of these views for whatever data might be needed at that moment. As
>>> with most things the answer is probably "it depends." If there is no
>>> easy answer could anyone tell me what the pros and cons are to each
>>> of these approaches?
>>> Chris Hicks 		 	   		
>>> _________________________________________________________________
>>> Hotmail has tools for the New Busy. Search, chat and e-mail from
>>> your inbox.
>>> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
>>
> 		 	   		
> _________________________________________________________________
> The New Busy think 9 to 5 is a cute idea. Combine multiple calendars  
> with Hotmail.
> http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5


RE: Best views performance

Posted by Chris Hicks <si...@hotmail.com>.
That does help. I am looking for low client side access times and quick updating of the indexes as in many cases there will be many rapid-fire changes to certain documents and I need the index to be as up to date as possible. All The other concerns, while always important to keep in mind, are lower priority for me for this project. I think going the same route as you did is best, break everything down into really small bits and only grab the specific data needed from a smaller index. Thanks for the reply Simon.
Chris Hicks

> From: simonmetson@googlemail.com
> To: user@couchdb.apache.org
> Subject: Re: Best views performance
> Date: Sat, 8 May 2010 07:56:42 -0500
> 
> Hi,
> 	I think it depends on what you are willing to trade. More indexes can  
> mean using up more disk space, and potentially longer calculation  
> time, but more efficient access on the client side (e.g. I'd get the  
> two fields I wanted instead of 20). So it depends on your application:  
> do you pay through the nose for storage, do you expect a lot of  
> updates (and hence a lot of view indexing) or are things fairly  
> static, do you have 1 client process of millions....
> 	I tend to try and have a view that indexes one piece of information  
> and then reuse that as much as possible with grouping, reduce=false,  
> include_docs etc. I'll only add a view when I know that one of my  
> existing ones can't be bent to my will. Not sure if that's the best  
> way to go so YMMV...
> Cheers
> Simon
> 
> On 7 May 2010, at 17:03, Chris Hicks wrote:
> 
> >
> > If I have a DB of lets say 100K+ documents, with each document  
> > having 20 fields, would it be more efficient to have a view that has  
> > indexed each document and simply run more complex queries over them  
> > or would it be better to have a number of smaller views, each  
> > covering only a few fields, and running much simpler queries on each  
> > of these views for whatever data might be needed at that moment. As  
> > with most things the answer is probably "it depends." If there is no  
> > easy answer could anyone tell me what the pros and cons are to each  
> > of these approaches?
> > Chris Hicks 		 	   		
> > _________________________________________________________________
> > Hotmail has tools for the New Busy. Search, chat and e-mail from  
> > your inbox.
> > http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
> 
 		 	   		  
_________________________________________________________________
The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. 
http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5

Re: Best views performance

Posted by Simon Metson <si...@googlemail.com>.
Hi,
	I think it depends on what you are willing to trade. More indexes can  
mean using up more disk space, and potentially longer calculation  
time, but more efficient access on the client side (e.g. I'd get the  
two fields I wanted instead of 20). So it depends on your application:  
do you pay through the nose for storage, do you expect a lot of  
updates (and hence a lot of view indexing) or are things fairly  
static, do you have 1 client process of millions....
	I tend to try and have a view that indexes one piece of information  
and then reuse that as much as possible with grouping, reduce=false,  
include_docs etc. I'll only add a view when I know that one of my  
existing ones can't be bent to my will. Not sure if that's the best  
way to go so YMMV...
Cheers
Simon

On 7 May 2010, at 17:03, Chris Hicks wrote:

>
> If I have a DB of lets say 100K+ documents, with each document  
> having 20 fields, would it be more efficient to have a view that has  
> indexed each document and simply run more complex queries over them  
> or would it be better to have a number of smaller views, each  
> covering only a few fields, and running much simpler queries on each  
> of these views for whatever data might be needed at that moment. As  
> with most things the answer is probably "it depends." If there is no  
> easy answer could anyone tell me what the pros and cons are to each  
> of these approaches?
> Chris Hicks 		 	   		
> _________________________________________________________________
> Hotmail has tools for the New Busy. Search, chat and e-mail from  
> your inbox.
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1