You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Michael van Rooyen <mi...@loot.co.za> on 2013/10/08 08:53:51 UTC

Re: Lucene 4.4.0 mergeSegments OutOfMemoryError

With forceMerge(1) throwing an OOM error, we switched to 
forceMergeDeletes() which worked for a while, but that is now also 
running out of memory.  As a result, I've turned all manner of forced 
merges off.

I'm more than a little apprehensive that if the OOM error can happen as 
part of a forced merge, then it may also be able to happen as part of 
normal merges as the index grows.  I'd be grateful if someone who's 
grokked the code for segment merges could shed some light on whether I'm 
worrying unnecessarily...

Thanks,
Michael.

On 2013/09/26 01:43 PM, Michael van Rooyen wrote:
> Thanks for the suggestion Ian.  I switched the optimization to do 
> forceMergeDeletes() instead of forceMerge(1) and it completed 
> successfully, so we will use that instead.  At least then we're 
> guaranteed to have no more than 10% of dead space in the index.
>
> I love the videos on Mike's post - I've always thought that the Lucene 
> segment/merge mechanism is such an elegant and efficient way of 
> handling a dynamic index.
>
> Michael.
>
> On 2013/09/26 12:45 PM, Ian Lea wrote:
>> There's a blog posting from Mike McCandless  about merging at
>> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html. 
>>
>>   Not very recent but probably still relevant.
>>
>> You could try IndexWrite.forceMergeDeletes() rather than
>> forceMerge(1).  Still costly but probably less so, and might complete!
>>
>>
>> -- 
>> Ian.
>>
>>
>> On Thu, Sep 26, 2013 at 11:25 AM, Michael van Rooyen 
>> <mi...@loot.co.za> wrote:
>>> Yes, it happens as part of the early morning optimize, and yes, it's a
>>> forceMerge(1) which I've disabled for now.
>>>
>>> I haven't looked at the persistence mechanism for Lucene since 2.x, 
>>> but if I
>>> remember correctly, the deleted documents would stay in an index 
>>> segment
>>> until that segment was eventually merged.  Without forcing a merge 
>>> (optimize
>>> in old versions), the footprint on disk could be a multiple of the 
>>> actual
>>> space required for the live documents, and this would have an impact on
>>> performance (the deleted documents would clutter the buffer cache).
>>>
>>> Is this still the case?  I would have thought it good practice to 
>>> force the
>>> dead space out of an index periodically, but if the underlying storage
>>> mechanism has changed and the current index files are more efficient at
>>> housekeeping, this may no longer be necessary.
>>>
>>> If someone could shed a little light on best practice for indexes where
>>> documents are frequently updated (i.e. deleted and re-added), that 
>>> would be
>>> great.
>>>
>>> Michael.
>>>
>>>
>>> On 2013/09/26 11:43 AM, Ian Lea wrote:
>>>> Is this OOM happening as part of your early morning optimize or at
>>>> some other point?  By optimize do you mean IndexWriter.forceMerge(1)?
>>>> You really shouldn't have to use that. If the index grows forever
>>>> without it then something else is going on which you might wish to
>>>> report separately.
>>>>
>>>>
>>>> -- 
>>>> Ian.
>>>>
>>>>
>>>> On Wed, Sep 25, 2013 at 12:35 PM, Michael van Rooyen 
>>>> <mi...@loot.co.za>
>>>> wrote:
>>>>> We've recently upgraded to Lucene 4.4.0 and mergeSegments now 
>>>>> causes an
>>>>> OOM
>>>>> error.
>>>>>
>>>>> As background, our index contains about 14 million documents (growing
>>>>> slowly) and we process about 1 million updates per day. It's about 
>>>>> 8GB on
>>>>> disk.  I'm not sure if the Lucene segments merge the way they used 
>>>>> to in
>>>>> the
>>>>> early versions, but we've always optimized at 3am to get rid of dead
>>>>> space
>>>>> in the index, or otherwise it grows forever.
>>>>>
>>>>> The mergeSegments was working under 4.3.1 but the index has grown
>>>>> somewhat
>>>>> on disk since then, probably due to a couple of added 
>>>>> NumericDocValues
>>>>> fields.  The java process is assigned about 3GB (the maximum, as it's
>>>>> running on a 32 bit i686 Linux box), and it still goes OOM.
>>>>>
>>>>> Any advice as to the possible cause and how to circumvent it would be
>>>>> great.
>>>>> Here's the stack trace:
>>>>>
>>>>> org.apache.lucene.index.MergePolicy$MergeException:
>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>
>>>>> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545) 
>>>>>
>>>>>
>>>>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518) 
>>>>>
>>>>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>>>>
>>>>> org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:212) 
>>>>>
>>>>>
>>>>> org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:174) 
>>>>>
>>>>>
>>>>> org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:301) 
>>>>>
>>>>>
>>>>> org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:253) 
>>>>>
>>>>> org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:215) 
>>>>>
>>>>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
>>>>> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) 
>>>>>
>>>>> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
>>>>>
>>>>> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) 
>>>>>
>>>>>
>>>>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) 
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Michael.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene 4.4.0 mergeSegments OutOfMemoryError

Posted by Michael McCandless <lu...@mikemccandless.com>.
When you open this index for searching, how much heap do you give it?
In general, you should give IndexWriter the same heap size, since
during merge it will need to open N readers at once, and if you have
RAM resident doc values fields, those need enough heap space.

Also, the default DocValuesFormat in 4.5 has changed to be mostly
disk-based; if you upgrade & cutover your index, then you should need
much less heap to open readers / do merging.


Mike McCandless

http://blog.mikemccandless.com


On Tue, Oct 8, 2013 at 2:53 AM, Michael van Rooyen <mi...@loot.co.za> wrote:
> With forceMerge(1) throwing an OOM error, we switched to forceMergeDeletes()
> which worked for a while, but that is now also running out of memory.  As a
> result, I've turned all manner of forced merges off.
>
> I'm more than a little apprehensive that if the OOM error can happen as part
> of a forced merge, then it may also be able to happen as part of normal
> merges as the index grows.  I'd be grateful if someone who's grokked the
> code for segment merges could shed some light on whether I'm worrying
> unnecessarily...
>
> Thanks,
> Michael.
>
> On 2013/09/26 01:43 PM, Michael van Rooyen wrote:
>>
>> Thanks for the suggestion Ian.  I switched the optimization to do
>> forceMergeDeletes() instead of forceMerge(1) and it completed successfully,
>> so we will use that instead.  At least then we're guaranteed to have no more
>> than 10% of dead space in the index.
>>
>> I love the videos on Mike's post - I've always thought that the Lucene
>> segment/merge mechanism is such an elegant and efficient way of handling a
>> dynamic index.
>>
>> Michael.
>>
>> On 2013/09/26 12:45 PM, Ian Lea wrote:
>>>
>>> There's a blog posting from Mike McCandless  about merging at
>>>
>>> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html.
>>>   Not very recent but probably still relevant.
>>>
>>> You could try IndexWrite.forceMergeDeletes() rather than
>>> forceMerge(1).  Still costly but probably less so, and might complete!
>>>
>>>
>>> --
>>> Ian.
>>>
>>>
>>>
>>> On Thu, Sep 26, 2013 at 11:25 AM, Michael van Rooyen <mi...@loot.co.za>
>>> wrote:
>>>>
>>>> Yes, it happens as part of the early morning optimize, and yes, it's a
>>>> forceMerge(1) which I've disabled for now.
>>>>
>>>> I haven't looked at the persistence mechanism for Lucene since 2.x, but
>>>> if I
>>>> remember correctly, the deleted documents would stay in an index segment
>>>> until that segment was eventually merged.  Without forcing a merge
>>>> (optimize
>>>> in old versions), the footprint on disk could be a multiple of the
>>>> actual
>>>> space required for the live documents, and this would have an impact on
>>>> performance (the deleted documents would clutter the buffer cache).
>>>>
>>>> Is this still the case?  I would have thought it good practice to force
>>>> the
>>>> dead space out of an index periodically, but if the underlying storage
>>>> mechanism has changed and the current index files are more efficient at
>>>> housekeeping, this may no longer be necessary.
>>>>
>>>> If someone could shed a little light on best practice for indexes where
>>>> documents are frequently updated (i.e. deleted and re-added), that would
>>>> be
>>>> great.
>>>>
>>>> Michael.
>>>>
>>>>
>>>> On 2013/09/26 11:43 AM, Ian Lea wrote:
>>>>>
>>>>> Is this OOM happening as part of your early morning optimize or at
>>>>> some other point?  By optimize do you mean IndexWriter.forceMerge(1)?
>>>>> You really shouldn't have to use that. If the index grows forever
>>>>> without it then something else is going on which you might wish to
>>>>> report separately.
>>>>>
>>>>>
>>>>> --
>>>>> Ian.
>>>>>
>>>>>
>>>>> On Wed, Sep 25, 2013 at 12:35 PM, Michael van Rooyen
>>>>> <mi...@loot.co.za>
>>>>> wrote:
>>>>>>
>>>>>> We've recently upgraded to Lucene 4.4.0 and mergeSegments now causes
>>>>>> an
>>>>>> OOM
>>>>>> error.
>>>>>>
>>>>>> As background, our index contains about 14 million documents (growing
>>>>>> slowly) and we process about 1 million updates per day. It's about 8GB
>>>>>> on
>>>>>> disk.  I'm not sure if the Lucene segments merge the way they used to
>>>>>> in
>>>>>> the
>>>>>> early versions, but we've always optimized at 3am to get rid of dead
>>>>>> space
>>>>>> in the index, or otherwise it grows forever.
>>>>>>
>>>>>> The mergeSegments was working under 4.3.1 but the index has grown
>>>>>> somewhat
>>>>>> on disk since then, probably due to a couple of added NumericDocValues
>>>>>> fields.  The java process is assigned about 3GB (the maximum, as it's
>>>>>> running on a 32 bit i686 Linux box), and it still goes OOM.
>>>>>>
>>>>>> Any advice as to the possible cause and how to circumvent it would be
>>>>>> great.
>>>>>> Here's the stack trace:
>>>>>>
>>>>>> org.apache.lucene.index.MergePolicy$MergeException:
>>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>>
>>>>>>
>>>>>> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545)
>>>>>>
>>>>>>
>>>>>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518)
>>>>>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>>>>>
>>>>>>
>>>>>> org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:212)
>>>>>>
>>>>>>
>>>>>> org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:174)
>>>>>>
>>>>>>
>>>>>> org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:301)
>>>>>>
>>>>>>
>>>>>> org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:253)
>>>>>>
>>>>>> org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:215)
>>>>>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
>>>>>> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
>>>>>> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
>>>>>>
>>>>>>
>>>>>> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
>>>>>>
>>>>>>
>>>>>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Michael.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org