You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Mark Miller <ma...@gmail.com> on 2008/03/13 14:33:35 UTC
Index Merging Space Requirements
If I use LogByteSizeMergePolicy#setMaxMergeMB, can I clamp down on the
space needed for optimize/merge? My thought is, if a segment is maxed
out, it will never need to be copied for a merge right? So you could
significantly reduce merge/optimize space requirments (now at like 2x-4x
if readers can still open)?
- Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Index Merging Space Requirements
Posted by Michael McCandless <lu...@mikemccandless.com>.
Well ... yes and no?
Yes, the Log*MergePolicy will still at certain times merge the index
all the way down to one segment. If mergeFactor is 10 then this will
happen every "power of 10" flushed segments. Ie, after 10 flushes a
merge will merge them down to 1 segment, then after 100 flushes as
well, 1000 flushes, etc. These are done in the background with
ConcurrentMergeScheduler. I don't really like this quality of the
merge policy ... it's sort of a "pay it forward" approach. You are
paying in advance for the expectation that the index is going to keep
getting larger.
However, this merging does respect maxMergeMB, so if you've set that,
then it will not merge down to 1 segment once you have segments over
that size. So in this aspect it's different from a real call to
optimize.
Mike
Mark Miller wrote:
> Thanks a lot Mike...one more question:
>
> I remember reading that a regular addDocument call could basically
> trigger an optimize on a given call. Is this true? Maybe not true
> anymore?
>
> It doesnt sound right to me, but I do remember reading about it.
> This was pre background merging when it was mentioned the
> addDocument call could take a long time to return if basically the
> equiv of an optmize was triggered.
>
> Could you clear this up for me?
>
> Thanks
> Mark
>
> Michael McCandless wrote:
>>
>> Yes this should reduce transient (while merging) disk usage.
>> However, optimize disregards this parameter, so it will still use
>> the same disk space. However, if you call optimize(N) then that
>> should use less space since it does not merge all the way down to
>> 1 segment.
>>
>> Note that the limit applies to segments-to-be-merged not to the
>> final merged segment size. Ie, any segment > maxMergeMB will
>> never be merged, but at any given time you can easily have
>> segments quite a bit larger than maxMergeMB.
>>
>> Mike
>>
>> Mark Miller wrote:
>>
>>> If I use LogByteSizeMergePolicy#setMaxMergeMB, can I clamp down
>>> on the space needed for optimize/merge? My thought is, if a
>>> segment is maxed out, it will never need to be copied for a merge
>>> right? So you could significantly reduce merge/optimize space
>>> requirments (now at like 2x-4x if readers can still open)?
>>>
>>> - Mark
>>>
>>> --------------------------------------------------------------------
>>> -
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Index Merging Space Requirements
Posted by Mark Miller <ma...@gmail.com>.
Thanks a lot Mike...one more question:
I remember reading that a regular addDocument call could basically
trigger an optimize on a given call. Is this true? Maybe not true anymore?
It doesnt sound right to me, but I do remember reading about it. This
was pre background merging when it was mentioned the addDocument call
could take a long time to return if basically the equiv of an optmize
was triggered.
Could you clear this up for me?
Thanks
Mark
Michael McCandless wrote:
>
> Yes this should reduce transient (while merging) disk usage. However,
> optimize disregards this parameter, so it will still use the same disk
> space. However, if you call optimize(N) then that should use less
> space since it does not merge all the way down to 1 segment.
>
> Note that the limit applies to segments-to-be-merged not to the final
> merged segment size. Ie, any segment > maxMergeMB will never be
> merged, but at any given time you can easily have segments quite a bit
> larger than maxMergeMB.
>
> Mike
>
> Mark Miller wrote:
>
>> If I use LogByteSizeMergePolicy#setMaxMergeMB, can I clamp down on
>> the space needed for optimize/merge? My thought is, if a segment is
>> maxed out, it will never need to be copied for a merge right? So you
>> could significantly reduce merge/optimize space requirments (now at
>> like 2x-4x if readers can still open)?
>>
>> - Mark
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Index Merging Space Requirements
Posted by Michael McCandless <lu...@mikemccandless.com>.
Yes this should reduce transient (while merging) disk usage.
However, optimize disregards this parameter, so it will still use the
same disk space. However, if you call optimize(N) then that should
use less space since it does not merge all the way down to 1 segment.
Note that the limit applies to segments-to-be-merged not to the final
merged segment size. Ie, any segment > maxMergeMB will never be
merged, but at any given time you can easily have segments quite a
bit larger than maxMergeMB.
Mike
Mark Miller wrote:
> If I use LogByteSizeMergePolicy#setMaxMergeMB, can I clamp down on
> the space needed for optimize/merge? My thought is, if a segment is
> maxed out, it will never need to be copied for a merge right? So
> you could significantly reduce merge/optimize space requirments
> (now at like 2x-4x if readers can still open)?
>
> - Mark
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org