You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2011/11/04 18:57:38 UTC

Re: [Solr Wiki] Update of "SolrPerformanceFactors" by RobertMuir

Completley removing all of this info seems like more harm then good -- it 
actually advises against doing an optimize except when you know you're 
never going to modify your index, and it explains the downsides of 
optimizing.

i would suggest we add most of this back, but perhaps change the title 
(since many pieces of info in this section aren't specific to 
optimizing, they're just about segments) and be more vigorous in warning 
about the costs of optimize.

: The "SolrPerformanceFactors" page has been changed by RobertMuir:
: http://wiki.apache.org/solr/SolrPerformanceFactors?action=diff&rev1=28&rev2=29
: 
: Comment:
: die optimize die
: 
:   
:      * Memory usage during indexing
:      * Segment merge time
: -    * Optimization times
:      * Index size
:   
:   These impacts can be reduced by the use of `omitNorms="true"`
: @@ -74, +73 @@
: 
:   === Explicit Warming of Sort Fields ===
:   
:   If you do a lot of field based sorting, it is advantageous to add explicitly warming queries to the "newSearcher" and "firstSearcher" event listeners in your solrconfig which sort on those fields, so the !FieldCache is populated prior to any queries being executed by your users.
: - 
: - == Optimization Considerations ==
: - 
: - You may want to optimize an index whenever practical -- ie: if you build your index once, and then never modify it.
: - 
: - If your index is receiving a steady stream of modifications, then consider the following factors...
: - 
: -    * As more segments are added to the index, query performace will degrade slightly.  Automatic segment merging by Lucene will set an upper bound on the number of segments created though.
: -    * Auto-warming time will grow since it's normally dependent on doing searches. 
: -    * The first distribution after an optimization will take longer than subsequent ones. See [[CollectionDistribution|Collection Distribution]] for more information.
: -    * During optimization the file size of the index doubles, but returns to it's original size or even slightly less.
: -    * If you can, make sure that you do not have multiple concurrent producers of documents calling commit(). Multiple concurrent commits will cause a large performance degradation. 
: - 
: - Since optimizing an index saves all the segments in an index (about 7 files per segment) into a single segment, optimizing an index helps avoid the "too many open files" problem, i.e. running out of file descriptors, which is mentioned in an [[http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html#indexing_speed|ONJava Article]].
:   
:   == Updates and Commit Frequency Tradeoffs ==
:   
: 

-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: [Solr Wiki] Update of "SolrPerformanceFactors" by RobertMuir

Posted by Mark Miller <ma...@gmail.com>.
I've tweaked what was there - some of the info was a bit ... fugly ... so I kept the main point about optimize being useful for a static index, added a couple other bits, and it's open for someone else to add back anything else they might think is useful but was cut.

On Nov 5, 2011, at 8:42 AM, Mark Miller wrote:

> +1 - this is good info. The first thing it says is "You may want to optimize an index whenever practical -- ie: if you build your index once, and then never modify it."
> 
> True stuff. At worst we should clarify practical a bit more - but I still don't think it's bad as is.
> 
> Then it gives you further good info.
> 
> What is the point of removing it?
> 
> - Mark
> 
> On Nov 4, 2011, at 1:57 PM, Chris Hostetter wrote:
> 
>> 
>> Completley removing all of this info seems like more harm then good -- it 
>> actually advises against doing an optimize except when you know you're 
>> never going to modify your index, and it explains the downsides of 
>> optimizing.
>> 
>> i would suggest we add most of this back, but perhaps change the title 
>> (since many pieces of info in this section aren't specific to 
>> optimizing, they're just about segments) and be more vigorous in warning 
>> about the costs of optimize.
>> 
>> : The "SolrPerformanceFactors" page has been changed by RobertMuir:
>> : http://wiki.apache.org/solr/SolrPerformanceFactors?action=diff&rev1=28&rev2=29
>> : 
>> : Comment:
>> : die optimize die
>> : 
>> :   
>> :      * Memory usage during indexing
>> :      * Segment merge time
>> : -    * Optimization times
>> :      * Index size
>> :   
>> :   These impacts can be reduced by the use of `omitNorms="true"`
>> : @@ -74, +73 @@
>> : 
>> :   === Explicit Warming of Sort Fields ===
>> :   
>> :   If you do a lot of field based sorting, it is advantageous to add explicitly warming queries to the "newSearcher" and "firstSearcher" event listeners in your solrconfig which sort on those fields, so the !FieldCache is populated prior to any queries being executed by your users.
>> : - 
>> : - == Optimization Considerations ==
>> : - 
>> : - You may want to optimize an index whenever practical -- ie: if you build your index once, and then never modify it.
>> : - 
>> : - If your index is receiving a steady stream of modifications, then consider the following factors...
>> : - 
>> : -    * As more segments are added to the index, query performace will degrade slightly.  Automatic segment merging by Lucene will set an upper bound on the number of segments created though.
>> : -    * Auto-warming time will grow since it's normally dependent on doing searches. 
>> : -    * The first distribution after an optimization will take longer than subsequent ones. See [[CollectionDistribution|Collection Distribution]] for more information.
>> : -    * During optimization the file size of the index doubles, but returns to it's original size or even slightly less.
>> : -    * If you can, make sure that you do not have multiple concurrent producers of documents calling commit(). Multiple concurrent commits will cause a large performance degradation. 
>> : - 
>> : - Since optimizing an index saves all the segments in an index (about 7 files per segment) into a single segment, optimizing an index helps avoid the "too many open files" problem, i.e. running out of file descriptors, which is mentioned in an [[http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html#indexing_speed|ONJava Article]].
>> :   
>> :   == Updates and Commit Frequency Tradeoffs ==
>> :   
>> : 
>> 
>> -Hoss
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>> 
> 
> - Mark Miller
> lucidimagination.com
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 

- Mark Miller
lucidimagination.com












---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: [Solr Wiki] Update of "SolrPerformanceFactors" by RobertMuir

Posted by Mark Miller <ma...@gmail.com>.
+1 - this is good info. The first thing it says is "You may want to optimize an index whenever practical -- ie: if you build your index once, and then never modify it."

True stuff. At worst we should clarify practical a bit more - but I still don't think it's bad as is.

Then it gives you further good info.

What is the point of removing it?

- Mark
 
On Nov 4, 2011, at 1:57 PM, Chris Hostetter wrote:

> 
> Completley removing all of this info seems like more harm then good -- it 
> actually advises against doing an optimize except when you know you're 
> never going to modify your index, and it explains the downsides of 
> optimizing.
> 
> i would suggest we add most of this back, but perhaps change the title 
> (since many pieces of info in this section aren't specific to 
> optimizing, they're just about segments) and be more vigorous in warning 
> about the costs of optimize.
> 
> : The "SolrPerformanceFactors" page has been changed by RobertMuir:
> : http://wiki.apache.org/solr/SolrPerformanceFactors?action=diff&rev1=28&rev2=29
> : 
> : Comment:
> : die optimize die
> : 
> :   
> :      * Memory usage during indexing
> :      * Segment merge time
> : -    * Optimization times
> :      * Index size
> :   
> :   These impacts can be reduced by the use of `omitNorms="true"`
> : @@ -74, +73 @@
> : 
> :   === Explicit Warming of Sort Fields ===
> :   
> :   If you do a lot of field based sorting, it is advantageous to add explicitly warming queries to the "newSearcher" and "firstSearcher" event listeners in your solrconfig which sort on those fields, so the !FieldCache is populated prior to any queries being executed by your users.
> : - 
> : - == Optimization Considerations ==
> : - 
> : - You may want to optimize an index whenever practical -- ie: if you build your index once, and then never modify it.
> : - 
> : - If your index is receiving a steady stream of modifications, then consider the following factors...
> : - 
> : -    * As more segments are added to the index, query performace will degrade slightly.  Automatic segment merging by Lucene will set an upper bound on the number of segments created though.
> : -    * Auto-warming time will grow since it's normally dependent on doing searches. 
> : -    * The first distribution after an optimization will take longer than subsequent ones. See [[CollectionDistribution|Collection Distribution]] for more information.
> : -    * During optimization the file size of the index doubles, but returns to it's original size or even slightly less.
> : -    * If you can, make sure that you do not have multiple concurrent producers of documents calling commit(). Multiple concurrent commits will cause a large performance degradation. 
> : - 
> : - Since optimizing an index saves all the segments in an index (about 7 files per segment) into a single segment, optimizing an index helps avoid the "too many open files" problem, i.e. running out of file descriptors, which is mentioned in an [[http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html#indexing_speed|ONJava Article]].
> :   
> :   == Updates and Commit Frequency Tradeoffs ==
> :   
> : 
> 
> -Hoss
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 

- Mark Miller
lucidimagination.com












---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org