You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Phil Herold <ph...@d-wise.com> on 2011/02/09 19:58:00 UTC

index size doubling / optimization (Lucene 3.0.3)

I know that the size of a Lucene index can double while optimization is
underway, but it's supposed to eventually settle back down to the original
size, correct? We have a Lucene index consisting of 100K documents, that is
normally about 12GB in size. It is split across 10 sub-indexes which we
search using MultiSearcher. It takes our system about 7 hours to traverse
the file system and update the index, which typically adds, updates or
deletes anywhere from a dozen to a few hundred documents. We optimize each
sub-index at the end (although this is configurable). The system seems to
run fine for several days, with the total size of the index staying fairly
consistent, then all of the sudden the index size doubles to about 25GB, and
stays there. I'm assuming this is happening after an optimization-there is
certainly not a doubling of the data that is being added.

 

Is this expected or known behavior, or a bug of some kind?

 

I've read various postings on the 'net regarding optimization, and when to
do it, if at all, and I'm certainly open to other strategies. Search time is
critical for our users. 

 

FWIW, we have the following tunable parameters configured for our index:

 

mergeFactor: 5

maxMergeDocs: 1000

maxBufferedDocs: 200

RAMBufferSizeMB: 16

 

Any advice or help is appreciated. 


Re: index size doubling / optimization (Lucene 3.0.3)

Posted by Michael McCandless <lu...@mikemccandless.com>.
This is not expected.

Did the last IW exit "gracefully"?  If so, it should delete the old
segments after swapping in the optimized one.

Can you post infoStream output after running optimize?

Mike

On Wed, Feb 9, 2011 at 1:58 PM, Phil Herold <ph...@d-wise.com> wrote:
> I know that the size of a Lucene index can double while optimization is
> underway, but it's supposed to eventually settle back down to the original
> size, correct? We have a Lucene index consisting of 100K documents, that is
> normally about 12GB in size. It is split across 10 sub-indexes which we
> search using MultiSearcher. It takes our system about 7 hours to traverse
> the file system and update the index, which typically adds, updates or
> deletes anywhere from a dozen to a few hundred documents. We optimize each
> sub-index at the end (although this is configurable). The system seems to
> run fine for several days, with the total size of the index staying fairly
> consistent, then all of the sudden the index size doubles to about 25GB, and
> stays there. I'm assuming this is happening after an optimization-there is
> certainly not a doubling of the data that is being added.
>
>
>
> Is this expected or known behavior, or a bug of some kind?
>
>
>
> I've read various postings on the 'net regarding optimization, and when to
> do it, if at all, and I'm certainly open to other strategies. Search time is
> critical for our users.
>
>
>
> FWIW, we have the following tunable parameters configured for our index:
>
>
>
> mergeFactor: 5
>
> maxMergeDocs: 1000
>
> maxBufferedDocs: 200
>
> RAMBufferSizeMB: 16
>
>
>
> Any advice or help is appreciated.
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org