You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Michael Ryan <mr...@moreover.com> on 2011/08/25 05:35:57 UTC

Optimize requires 50% more disk space when there are exactly 20 segments

I'm using Solr 3.2 with a mergeFactor of 10 and no merge policy configured, thus using the default LogByteSizeMergePolicy.  Before I do an optimize, typically the largest segment will be about 90% of the total index size.

When I do an optimize, the total disk space required is usually about 2x the index size.  But about 10% of the time, the disk space required is about 3x the index size - when this happens, I see a very large segment created, roughly the size of the original index size, followed by another slightly larger segment.

After some investigating, I found that this would happen when there were exactly 20 segments in the index when the optimize started.  My hypothesis is that this is a side-effect of the 20 segments being evenly divisible by the mergeFactor of 10.  I'm thinking that when there are 20 segments, the largest segment is being merged twice - first when merging the 20 segments down to 2, then again when merging from 2 to 1.

I would like to avoid this if at all possible, as it requires 50% more disk space and takes almost twice as long to optimize.  Would using TieredMergePolicy help me here, or some other config I can change?

-Michael

Re: Optimize requires 50% more disk space when there are exactly 20 segments

Posted by Lance Norskog <go...@gmail.com>.
Which Solr version do you have? In 3.x and trunk, Tiered and
BalancedSegment are there for exactly this reason.

In Solr 1.4, your only trick is to do a partial optimize with
maxSegments. This lets you say "optimize until there are 15 segments,
then stop". Do this with smaller and smaller numbers.

On Wed, Aug 24, 2011 at 8:35 PM, Michael Ryan <mr...@moreover.com> wrote:
> I'm using Solr 3.2 with a mergeFactor of 10 and no merge policy configured, thus using the default LogByteSizeMergePolicy.  Before I do an optimize, typically the largest segment will be about 90% of the total index size.
>
> When I do an optimize, the total disk space required is usually about 2x the index size.  But about 10% of the time, the disk space required is about 3x the index size - when this happens, I see a very large segment created, roughly the size of the original index size, followed by another slightly larger segment.
>
> After some investigating, I found that this would happen when there were exactly 20 segments in the index when the optimize started.  My hypothesis is that this is a side-effect of the 20 segments being evenly divisible by the mergeFactor of 10.  I'm thinking that when there are 20 segments, the largest segment is being merged twice - first when merging the 20 segments down to 2, then again when merging from 2 to 1.
>
> I would like to avoid this if at all possible, as it requires 50% more disk space and takes almost twice as long to optimize.  Would using TieredMergePolicy help me here, or some other config I can change?
>
> -Michael
>



-- 
Lance Norskog
goksron@gmail.com