You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Bernhard Messer <Be...@intrafind.de> on 2004/08/07 14:12:53 UTC
possible SegmentMerger optimization
hi developers,
may be there is a small, but effective possibility to optimize the
SegmentMerger class when compound file option is enabled, which is
default since lucene 1.4.
The current implementation creates and writes the compound index file
every time the merge() method is called. Due to the fact, that io
operations are expensive and time consuming, it would be cool to write
the compound index file just when optimizing the index. The change
itself wouldn't be a big deal, adding a boolean parameter to
SegmenMerger.merge(boolean finalize). Only if finalize==true and
compound option is enabled, the compound file will be created. To
fullfill the implementation, the same parameter could be added to
mergeSegments(int minSegment, boolean finalize) within IndexWriter. When
mergeSegments is called from flushRamSegments() or maybeMergeSegments(),
finalize is set to false. Only when called from optimize(), finalize
will be set to true and the compound file will be written.
The dark side will be to explain developers, if they are not optimizing
the index before closing, compound file option has no effect. The other
thing is, that we might run into the problem of too many open files,
which sometimes was reported before the compound option was introduced.
The negative side could be solved when making the optimization
optionally available thru IndexWriter. So developers using lucene could
decide themself if they want to use the "single compound write" option
or not.
If wanted and you would like to see the patch, leave me a note and i'll
create it.
best regards
Bernhard
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: possible SegmentMerger optimization
Posted by Dmitry Serebrennikov <dm...@earthlink.net>.
Bernhard Messer wrote:
> Dmitry,
>
> yeap, you're right Dmitry. Switch on/off compound file would be the
> trick to simulate the same behavior i described. I did some test on
> that and found that it working perfect.
Great! I'm glad that helps with your issue. By the way, I like what you
did with reducing disk size requirements. That sounds like a great idea!
Thanks for taking this on. :)
Dmitry.
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: possible SegmentMerger optimization
Posted by Bernhard Messer <Be...@intrafind.de>.
Dmitry,
yeap, you're right Dmitry. Switch on/off compound file would be the
trick to simulate the same behavior i described. I did some test on that
and found that it working perfect. I think we can leave everything as it
is, maybe we should document it somewhere.
Does there exists something like a "tips and tricks" section on the
lucene website ?
Bernhard
Dmitry Serebrennikov wrote:
> Bernhard Messer wrote:
>
>> hi developers,
>>
>> may be there is a small, but effective possibility to optimize the
>> SegmentMerger class when compound file option is enabled, which is
>> default since lucene 1.4.
>>
>> The current implementation creates and writes the compound index file
>> every time the merge() method is called. Due to the fact, that io
>> operations are expensive and time consuming, it would be cool to
>> write the compound index file just when optimizing the index. The
>> change itself wouldn't be a big deal, adding a boolean parameter to
>> SegmenMerger.merge(boolean finalize). Only if finalize==true and
>> compound option is enabled, the compound file will be created. To
>> fullfill the implementation, the same parameter could be added to
>> mergeSegments(int minSegment, boolean finalize) within IndexWriter.
>> When mergeSegments is called from flushRamSegments() or
>> maybeMergeSegments(), finalize is set to false. Only when called from
>> optimize(), finalize will be set to true and the compound file will
>> be written.
>>
>> The dark side will be to explain developers, if they are not
>> optimizing the index before closing, compound file option has no
>> effect. The other thing is, that we might run into the problem of too
>> many open files, which sometimes was reported before the compound
>> option was introduced.
>
>
> Yea, that was kind of the point of having the compound files - to
> avoid too many file handles, especially during indexing. I hear you on
> inefficient use of disk IO, though.
>
>>
>> The negative side could be solved when making the optimization
>> optionally available thru IndexWriter. So developers using lucene
>> could decide themself if they want to use the "single compound write"
>> option or not.
>
>
> One could do that today. Just setUseCompoundFiles(false) during
> indexing and call setUseCompoundFiles(true) before the final optimize.
> Would that do the trick?
Dmitry.
>
>>
>> If wanted and you would like to see the patch, leave me a note and
>> i'll create it.
>>
>> best regards
>> Bernhard
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: possible SegmentMerger optimization
Posted by Dmitry Serebrennikov <dm...@earthlink.net>.
Bernhard Messer wrote:
> hi developers,
>
> may be there is a small, but effective possibility to optimize the
> SegmentMerger class when compound file option is enabled, which is
> default since lucene 1.4.
>
> The current implementation creates and writes the compound index file
> every time the merge() method is called. Due to the fact, that io
> operations are expensive and time consuming, it would be cool to write
> the compound index file just when optimizing the index. The change
> itself wouldn't be a big deal, adding a boolean parameter to
> SegmenMerger.merge(boolean finalize). Only if finalize==true and
> compound option is enabled, the compound file will be created. To
> fullfill the implementation, the same parameter could be added to
> mergeSegments(int minSegment, boolean finalize) within IndexWriter.
> When mergeSegments is called from flushRamSegments() or
> maybeMergeSegments(), finalize is set to false. Only when called from
> optimize(), finalize will be set to true and the compound file will be
> written.
>
> The dark side will be to explain developers, if they are not
> optimizing the index before closing, compound file option has no
> effect. The other thing is, that we might run into the problem of too
> many open files, which sometimes was reported before the compound
> option was introduced.
Yea, that was kind of the point of having the compound files - to avoid
too many file handles, especially during indexing. I hear you on
inefficient use of disk IO, though.
>
> The negative side could be solved when making the optimization
> optionally available thru IndexWriter. So developers using lucene
> could decide themself if they want to use the "single compound write"
> option or not.
One could do that today. Just setUseCompoundFiles(false) during indexing
and call setUseCompoundFiles(true) before the final optimize. Would that
do the trick?
Dmitry.
>
> If wanted and you would like to see the patch, leave me a note and
> i'll create it.
>
> best regards
> Bernhard
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org