You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Mark Miller <ma...@gmail.com> on 2008/03/13 14:33:35 UTC

Index Merging Space Requirements

If I use LogByteSizeMergePolicy#setMaxMergeMB, can I clamp down on the 
space needed for optimize/merge? My thought is, if a segment is maxed 
out, it will never need to be copied for a merge right? So you could 
significantly reduce merge/optimize space requirments (now at like 2x-4x 
if readers can still open)?

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Index Merging Space Requirements

Posted by Michael McCandless <lu...@mikemccandless.com>.
Well ... yes and no?

Yes, the Log*MergePolicy will still at certain times merge the index  
all the way down to one segment.  If mergeFactor is 10 then this will  
happen every "power of 10" flushed segments.  Ie, after 10 flushes a  
merge will merge them down to 1 segment, then after 100 flushes as  
well, 1000 flushes, etc.  These are done in the background with  
ConcurrentMergeScheduler.  I don't really like this quality of the  
merge policy ... it's sort of a "pay it forward" approach.  You are  
paying in advance for the expectation that the index is going to keep  
getting larger.

However, this merging does respect maxMergeMB, so if you've set that,  
then it will not merge down to 1 segment once you have segments over  
that size.  So in this aspect it's different from a real call to  
optimize.

Mike

Mark Miller wrote:

> Thanks a lot Mike...one more question:
>
> I remember reading that a regular addDocument call could basically  
> trigger an optimize on a given call. Is this true? Maybe not true  
> anymore?
>
> It doesnt sound right to me, but I do remember reading about it.  
> This was pre background merging when it was mentioned the  
> addDocument call could take a long time to return if basically the  
> equiv of an optmize was triggered.
>
> Could you clear this up for me?
>
> Thanks
> Mark
>
> Michael McCandless wrote:
>>
>> Yes this should reduce transient (while merging) disk usage.   
>> However, optimize disregards this parameter, so it will still use  
>> the same disk space. However, if you call optimize(N) then that  
>> should use less space since it does not merge all the way down to  
>> 1 segment.
>>
>> Note that the limit applies to segments-to-be-merged not to the  
>> final merged segment size.  Ie, any segment > maxMergeMB will  
>> never be merged, but at any given time you can easily have  
>> segments quite a bit larger than maxMergeMB.
>>
>> Mike
>>
>> Mark Miller wrote:
>>
>>> If I use LogByteSizeMergePolicy#setMaxMergeMB, can I clamp down  
>>> on the space needed for optimize/merge? My thought is, if a  
>>> segment is maxed out, it will never need to be copied for a merge  
>>> right? So you could significantly reduce merge/optimize space  
>>> requirments (now at like 2x-4x if readers can still open)?
>>>
>>> - Mark
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Index Merging Space Requirements

Posted by Mark Miller <ma...@gmail.com>.
Thanks a lot Mike...one more question:

I remember reading that a regular addDocument call could basically 
trigger an optimize on a given call. Is this true? Maybe not true anymore?

It doesnt sound right to me, but I do remember reading about it. This 
was pre background merging when it was mentioned the addDocument call 
could take a long time to return if basically the equiv of an optmize 
was triggered.

Could you clear this up for me?

Thanks
Mark

Michael McCandless wrote:
>
> Yes this should reduce transient (while merging) disk usage.  However, 
> optimize disregards this parameter, so it will still use the same disk 
> space. However, if you call optimize(N) then that should use less 
> space since it does not merge all the way down to 1 segment.
>
> Note that the limit applies to segments-to-be-merged not to the final 
> merged segment size.  Ie, any segment > maxMergeMB will never be 
> merged, but at any given time you can easily have segments quite a bit 
> larger than maxMergeMB.
>
> Mike
>
> Mark Miller wrote:
>
>> If I use LogByteSizeMergePolicy#setMaxMergeMB, can I clamp down on 
>> the space needed for optimize/merge? My thought is, if a segment is 
>> maxed out, it will never need to be copied for a merge right? So you 
>> could significantly reduce merge/optimize space requirments (now at 
>> like 2x-4x if readers can still open)?
>>
>> - Mark
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Index Merging Space Requirements

Posted by Michael McCandless <lu...@mikemccandless.com>.
Yes this should reduce transient (while merging) disk usage.   
However, optimize disregards this parameter, so it will still use the  
same disk space. However, if you call optimize(N) then that should  
use less space since it does not merge all the way down to 1 segment.

Note that the limit applies to segments-to-be-merged not to the final  
merged segment size.  Ie, any segment > maxMergeMB will never be  
merged, but at any given time you can easily have segments quite a  
bit larger than maxMergeMB.

Mike

Mark Miller wrote:

> If I use LogByteSizeMergePolicy#setMaxMergeMB, can I clamp down on  
> the space needed for optimize/merge? My thought is, if a segment is  
> maxed out, it will never need to be copied for a merge right? So  
> you could significantly reduce merge/optimize space requirments  
> (now at like 2x-4x if readers can still open)?
>
> - Mark
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org