You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Daniel Taurat <da...@gaussvip.com> on 2004/09/09 19:47:36 UTC

Out of memory in lucene 1.4.1 when re-indexing large number of documents

Hi,
I am facing an out of memory problem using  Lucene 1.4.1.
I am  re-indexing a pretty large number ( about 30.000 ) of documents.
I identify old instances by checking for a unique ID field, delete those 
with indexReader.delete() and add the new document version.

HeapDump says I am having  a huge number of HashMaps with 
SegmentTermEnum objects (256891) .

IndexReader is closed directly after delete(term)...

Seems to me that this did not happen with version1.2 (same number of 
objects and  all...).
Has anyone an idea how I get  these "hanging"  objects? Or what to do in 
order to avoid them?

Thanks
Daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: question on Hits.doc

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Hello Roy,

This sounds normal.  When you pull a Document from Hits, you are really
pulling it from the disk.  All fields are read from disk at that time
(i.e. no lazy loading of fields), so if you have large text fields,
this is going to result in a lot of disk IO.  You could try running
vmstat or sar (I'm assuming you are using a UNIX flavour) and look at
the bi/bo (really just bo) column (bo = blocks out -- data read from
disks).

There is not much you can do.  If you don't have to store the field,
they will probably help.  Some people are working on adding support for
field compression, so maybe that will help.

Otis

--- roy-lucene-user@xemaps.com wrote:

> Hey guys,
> 
> We were noticing some speed problems on our searches and after adding
> some
> debug statements to the lucene source code, we have determined that
> the
> Hits.doc(x) is the problem.  (BTW, we are using Lucene 1.2 [with
> plans to
> upgrade]).  It seems that retrieving the actual Document from the
> search is
> very slow.
> 
> We think it might be our "Message" field which stores a huge amount
> of text. 
> We are currently running a test in which we won't "store" the
> "Message" field,
> however, I was wondering if any of you guys would know if that would
> be the
> reason why we're having the performance problems?  If so, could
> anyone also
> please explain it?  It seemed that we weren't having these
> performance
> problems before.  Has anyone else experienced this?  Our environment
> is NT 4,
> JDK 1.4.2, and PIIIs.
> 
> I know that for large text fields, storing the field is not a good
> practice,
> however, it held certain conveniences for us that I hope to not get
> rid of.
> 
> Roy.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

question on Hits.doc

Posted by ro...@xemaps.com.

Hey guys,

We were noticing some speed problems on our searches and after adding some
debug statements to the lucene source code, we have determined that the
Hits.doc(x) is the problem.  (BTW, we are using Lucene 1.2 [with plans to
upgrade]).  It seems that retrieving the actual Document from the search is
very slow.

We think it might be our "Message" field which stores a huge amount of text. 
We are currently running a test in which we won't "store" the "Message" field,
however, I was wondering if any of you guys would know if that would be the
reason why we're having the performance problems?  If so, could anyone also
please explain it?  It seemed that we weren't having these performance
problems before.  Has anyone else experienced this?  Our environment is NT 4,
JDK 1.4.2, and PIIIs.

I know that for large text fields, storing the field is not a good practice,
however, it held certain conveniences for us that I hope to not get rid of.

Roy.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Out of memory in lucene 1.4.1 when re-indexing large number of documents

Posted by "Kevin A. Burton" <bu...@newsmonster.org>.

Daniel Taurat wrote:

> Hi Pete,
> good hint, but we actually do have physical memory of  4Gb on the 
> system. But then: we also have experienced that the gc of ibm jdk1.3.1 
> that we use is sometimes
> behaving strangely with too large heap space anyway. (Limit seems to 
> be 1.2 Gb)

Depends on what OS and with what patches...

Linux on i386 seems to have a physical limit of 1.7G (256M for VM) ... 
There are some patches to apply to get 3G but only on really modern kernels.

I just need to get Athlon systems :-/

Kevin

-- 

Please reply using PGP.

    http://peerfear.org/pubkey.asc    
    
    NewsMonster - http://www.newsmonster.org/
    
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
  IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Out of memory in lucene 1.4.1 when re-indexing large number of documents

Posted by Daniel Taurat <da...@gaussvip.com>.

The Parser is pdfBox. pdf is about 25% of the over all indexing volume  
on the productive system. I also have word-docs and loads of hmtl 
resources to be indexed.
In my testing environment I merely have 5 pdf docs and still those 
permanent object hanging around, though.
Cheers,
Daniel

Ben Litchfield wrote:

>>I can say that gc is not collecting these objects since I  forced gc
>>runs when indexing every now and then (when parsing pdf-type objects,
>>that is): No effect.
>>    
>>
> <>
> What PDF parser are you using? Is the problem within the parser and not
> lucene? Are you releasing all resources?
> Ben
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Out of memory in lucene 1.4.1 when re-indexing large number of documents

Posted by Ben Litchfield <be...@csh.rit.edu>.

> I can say that gc is not collecting these objects since I  forced gc
> runs when indexing every now and then (when parsing pdf-type objects,
> that is): No effect.

What PDF parser are you using?  Is the problem within the parser and not
lucene?  Are you releasing all resources?

Ben

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Out of memory in lucene 1.4.1 when re-indexing large number of documents

Posted by John Moylan <jo...@rte.ie>.

IBM JDK1.4.2 should work fine. AFAIK JDK1.3.1 is usable if you disable JIT.

John

Daniel Taurat wrote:
> Hi Doug,
> you are absolutely right about the older version of the JDK: it is 1.3.1 
> (ibm).
> Unfortunately we cannot upgrade since we are bound to IBM Portalserver 4 
> environment.
> Results:
> I patched the Lucene1.4.1:
> it has improved not much: after indexing 1897 Objects  the number of 
> SegmentTermEnum is up to 17936.
> To be realistic: This is even a deterioration :(((
> My next check will be with a JDK1.4.2 for the test environment, but this 
> can only be a reference run for now.
> 
> Thanks,
> Daniel
> 
> Doug Cutting wrote:
> 
>> It sounds like the ThreadLocal in TermInfosReader is not getting 
>> correctly garbage collected when the TermInfosReader is collected. 
>> Researching a bit, this was a bug in JVMs prior to 1.4.2, so my guess 
>> is that you're running in an older JVM.  Is that right?
> 
> 
>>
>> I've attached a patch which should fix this.  Please tell me if it 
>> works for you.
>>
>> Doug
>>
>> Daniel Taurat wrote:
>>
>>> Okay, that (1.4rc3)worked fine, too!
>>> Got only 257 SegmentTermEnums for 1900 objects.
>>>
>>> Now I will go for the final test on the production server with the 
>>> 1.4rc3 version  and about 40.000 objects.
>>>
>>> Daniel
>>>
>>> Daniel Taurat schrieb:
>>>
>>>> Hi all,
>>>> here is some update for you:
>>>> I switched back to Lucene 1.3-final and now the  number of the  
>>>> SegmentTermEnum objects is controlled by gc again:
>>>> it goes up to about 1000 and then it is down again to 254 after 
>>>> indexing my 1900 test-objects.
>>>> Stay tuned, I will try 1.4RC3 now, the last version before 
>>>> FieldCache was introduced...
>>>>
>>>> Daniel
>>>>
>>>>
>>>> Rupinder Singh Mazara schrieb:
>>>>
>>>>> hi all
>>>>>  I had a similar problem, i have  database of documents with 24 
>>>>> fields, and a average content of 7K, with  16M+ records
>>>>>
>>>>>  i had to split the jobs into slabs of 1M each and merging the 
>>>>> resulting indexes, submissions to our job queue looked like
>>>>>
>>>>>  java -Xms100M -Xcompactexplicitgc -cp $CLASSPATH lucene.Indexer 22
>>>>>  
>>>>> and i still had outofmemory exception , the solution that i created 
>>>>> was to after every 200K, documents create a temp directory, and 
>>>>> merge them together, this was done to do the first production run, 
>>>>> updates are now being handled incrementally
>>>>>
>>>>>  
>>>>>
>>>>> Exception in thread "main" java.lang.OutOfMemoryError
>>>>> at 
>>>>> org.apache.lucene.store.RAMOutputStream.flushBuffer(RAMOutputStream.java(Compiled 
>>>>> Code))
>>>>>     at 
>>>>> org.apache.lucene.store.OutputStream.flush(OutputStream.java(Inlined 
>>>>> Compiled Code))
>>>>>     at 
>>>>> org.apache.lucene.store.OutputStream.writeByte(OutputStream.java(Inlined 
>>>>> Compiled Code))
>>>>>     at 
>>>>> org.apache.lucene.store.OutputStream.writeBytes(OutputStream.java(Compiled 
>>>>> Code))
>>>>>     at 
>>>>> org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java(Compiled 
>>>>> Code))
>>>>>     at 
>>>>> org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java(Compiled 
>>>>> Code))
>>>>>     at 
>>>>> org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java(Compiled 
>>>>> Code))
>>>>>     at 
>>>>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java(Compiled 
>>>>> Code))
>>>>>     at 
>>>>> org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java(Compiled 
>>>>> Code))
>>>>>     at 
>>>>> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:366)
>>>>>     at lucene.Indexer.doIndex(CDBIndexer.java(Compiled Code))
>>>>>     at lucene.Indexer.main(CDBIndexer.java:168)
>>>>>
>>>>>  
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Daniel Taurat [mailto:daniel.taurat@gaussvip.com]
>>>>>> Sent: 10 September 2004 14:42
>>>>>> To: Lucene Users List
>>>>>> Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large 
>>>>>> number
>>>>>> of documents
>>>>>>
>>>>>>
>>>>>> Hi Pete,
>>>>>> good hint, but we actually do have physical memory of  4Gb on the 
>>>>>> system. But then: we also have experienced that the gc of ibm 
>>>>>> jdk1.3.1 that we use is sometimes
>>>>>> behaving strangely with too large heap space anyway. (Limit seems 
>>>>>> to be 1.2 Gb)
>>>>>> I can say that gc is not collecting these objects since I  forced 
>>>>>> gc runs when indexing every now and then (when parsing pdf-type 
>>>>>> objects, that is): No effect.
>>>>>>
>>>>>> regards,
>>>>>>
>>>>>> Daniel
>>>>>>
>>>>>>
>>>>>> Pete Lewis wrote:
>>>>>>
>>>>>>  
>>>>>>
>>>>>>> Hi all
>>>>>>>
>>>>>>> Reading the thread with interest, there is another way I've come     
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> across out
>>>>>>  
>>>>>>
>>>>>>> of memory errors when indexing large batches of documents.
>>>>>>>
>>>>>>> If you have your heap space settings too high, then you get     
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> swapping (which
>>>>>>  
>>>>>>
>>>>>>> impacts performance) plus you never reach the trigger for garbage
>>>>>>> collection, hence you don't garbage collect and hence you run 
>>>>>>> out     
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> of memory.
>>>>>>  
>>>>>>
>>>>>>> Can you check whether or not your garbage collection is being 
>>>>>>> triggered?
>>>>>>>
>>>>>>> Anomalously therefore if this is the case, by reducing the heap 
>>>>>>> space you
>>>>>>> can improve performance get rid of the out of memory errors.
>>>>>>>
>>>>>>> Cheers
>>>>>>> Pete Lewis
>>>>>>>
>>>>>>> ----- Original Message ----- From: "Daniel Taurat" 
>>>>>>> <da...@gaussvip.com>
>>>>>>> To: "Lucene Users List" <lu...@jakarta.apache.org>
>>>>>>> Sent: Friday, September 10, 2004 1:10 PM
>>>>>>> Subject: Re: Out of memory in lucene 1.4.1 when re-indexing 
>>>>>>> large     
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> number of
>>>>>>  
>>>>>>
>>>>>>> documents
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  
>>>>>>>
>>>>>>>> Daniel Aber schrieb:
>>>>>>>>
>>>>>>>>  
>>>>>>>>    
>>>>>>>>
>>>>>>>>> On Thursday 09 September 2004 19:47, Daniel Taurat wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>         
>>>>>>>>>
>>>>>>>>>> I am facing an out of memory problem using  Lucene 1.4.1.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Could you try with a recent CVS version? There has been a fix 
>>>>>>>>>         
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>> about files
>>>>>>  
>>>>>>
>>>>>>>>> not being deleted after 1.4.1. Not sure if that could cause the 
>>>>>>>>> problems
>>>>>>>>> you're experiencing.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Daniel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>            
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Well, it seems not to be files, it looks more like those 
>>>>>>>> SegmentTermEnum
>>>>>>>> objects accumulating in memory.
>>>>>>>> #I've seen some discussion on these objects in the 
>>>>>>>> developer-newsgroup
>>>>>>>> that had taken place some time ago.
>>>>>>>> I am afraid this is some kind of runaway caching I have to deal 
>>>>>>>> with.
>>>>>>>> Maybe not  correctly addressed in this newsgroup, after all...
>>>>>>>>
>>>>>>>> Anyway: any idea if there is an API command to re-init caches?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Daniel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------- 
>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>>>>> For additional commands, e-mail: 
>>>>>>>> lucene-user-help@jakarta.apache.org
>>>>>>>>
>>>>>>>>  
>>>>>>>>       
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------- 
>>>>>>>
>>>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>     
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>>>
>>>>>>
>>>>>>   
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>>
>>>>>
>>>>>  
>>>>>
>>>>
>>>>
>>>
>>>
>> ------------------------------------------------------------------------
>>
>> Index: src/java/org/apache/lucene/index/TermInfosReader.java
>> ===================================================================
>> RCS file: 
>> /home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/TermInfosReader.java,v 
>>
>> retrieving revision 1.9
>> diff -u -r1.9 TermInfosReader.java
>> --- src/java/org/apache/lucene/index/TermInfosReader.java    6 Aug 
>> 2004 20:50:29 -0000    1.9
>> +++ src/java/org/apache/lucene/index/TermInfosReader.java    10 Sep 
>> 2004 17:46:47 -0000
>> @@ -45,6 +45,11 @@
>>     readIndex();
>>   }
>>
>> +  protected final void finalize() {
>> +    // patch for pre-1.4.2 JVMs, whose ThreadLocals leak
>> +    enumerators.set(null);
>> +  }
>> +
>>   public int getSkipInterval() {
>>     return origEnum.skipInterval;
>>   }
>>
>>  
>>
>> ------------------------------------------------------------------------
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


******************************************************************************
The information in this e-mail is confidential and may be legally privileged.
It is intended solely for the addressee. Access to this e-mail by anyone else
is unauthorised. If you are not the intended recipient, any disclosure,
copying, distribution, or any action taken or omitted to be taken in reliance
on it, is prohibited and may be unlawful.
Please note that emails to, from and within RT� may be subject to the Freedom
of Information Act 1997 and may be liable to disclosure.
******************************************************************************

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Out of memory in lucene 1.4.1 when re-indexing large number of documents

Posted by Daniel Taurat <da...@gaussvip.com>.

Okay,  reference test is done:
on JDK 1.4.2 Lucene 1.4.1 really seems to run fine: just a moderate 
number of SegmentTermEnums that is controlled by gc (about 500 for the 
1900 test objects).


Daniel Taurat wrote:

> Hi Doug,
> you are absolutely right about the older version of the JDK: it is 
> 1.3.1 (ibm).
> Unfortunately we cannot upgrade since we are bound to IBM Portalserver 
> 4 environment.
> Results:
> I patched the Lucene1.4.1:
> it has improved not much: after indexing 1897 Objects  the number of 
> SegmentTermEnum is up to 17936.
> To be realistic: This is even a deterioration :(((
> My next check will be with a JDK1.4.2 for the test environment, but 
> this can only be a reference run for now.
>
> Thanks,
> Daniel
>
> Doug Cutting wrote:
>
>> It sounds like the ThreadLocal in TermInfosReader is not getting 
>> correctly garbage collected when the TermInfosReader is collected. 
>> Researching a bit, this was a bug in JVMs prior to 1.4.2, so my guess 
>> is that you're running in an older JVM.  Is that right?
>
>
>>
>> I've attached a patch which should fix this.  Please tell me if it 
>> works for you.
>>
>> Doug
>>
>> Daniel Taurat wrote:
>>
>>> Okay, that (1.4rc3)worked fine, too!
>>> Got only 257 SegmentTermEnums for 1900 objects.
>>>
>>> Now I will go for the final test on the production server with the 
>>> 1.4rc3 version  and about 40.000 objects.
>>>
>>> Daniel
>>>
>>> Daniel Taurat schrieb:
>>>
>>>> Hi all,
>>>> here is some update for you:
>>>> I switched back to Lucene 1.3-final and now the  number of the  
>>>> SegmentTermEnum objects is controlled by gc again:
>>>> it goes up to about 1000 and then it is down again to 254 after 
>>>> indexing my 1900 test-objects.
>>>> Stay tuned, I will try 1.4RC3 now, the last version before 
>>>> FieldCache was introduced...
>>>>
>>>> Daniel
>>>>
>>>>
>>>> Rupinder Singh Mazara schrieb:
>>>>
>>>>> hi all
>>>>>  I had a similar problem, i have  database of documents with 24 
>>>>> fields, and a average content of 7K, with  16M+ records
>>>>>
>>>>>  i had to split the jobs into slabs of 1M each and merging the 
>>>>> resulting indexes, submissions to our job queue looked like
>>>>>
>>>>>  java -Xms100M -Xcompactexplicitgc -cp $CLASSPATH lucene.Indexer 22
>>>>>  
>>>>> and i still had outofmemory exception , the solution that i 
>>>>> created was to after every 200K, documents create a temp 
>>>>> directory, and merge them together, this was done to do the first 
>>>>> production run, updates are now being handled incrementally
>>>>>
>>>>>  
>>>>>
>>>>> Exception in thread "main" java.lang.OutOfMemoryError
>>>>> at 
>>>>> org.apache.lucene.store.RAMOutputStream.flushBuffer(RAMOutputStream.java(Compiled 
>>>>> Code))
>>>>>     at 
>>>>> org.apache.lucene.store.OutputStream.flush(OutputStream.java(Inlined 
>>>>> Compiled Code))
>>>>>     at 
>>>>> org.apache.lucene.store.OutputStream.writeByte(OutputStream.java(Inlined 
>>>>> Compiled Code))
>>>>>     at 
>>>>> org.apache.lucene.store.OutputStream.writeBytes(OutputStream.java(Compiled 
>>>>> Code))
>>>>>     at 
>>>>> org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java(Compiled 
>>>>> Code))
>>>>>     at 
>>>>> org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java(Compiled 
>>>>> Code))
>>>>>     at 
>>>>> org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java(Compiled 
>>>>> Code))
>>>>>     at 
>>>>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java(Compiled 
>>>>> Code))
>>>>>     at 
>>>>> org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java(Compiled 
>>>>> Code))
>>>>>     at 
>>>>> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:366)
>>>>>     at lucene.Indexer.doIndex(CDBIndexer.java(Compiled Code))
>>>>>     at lucene.Indexer.main(CDBIndexer.java:168)
>>>>>
>>>>>  
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Daniel Taurat [mailto:daniel.taurat@gaussvip.com]
>>>>>> Sent: 10 September 2004 14:42
>>>>>> To: Lucene Users List
>>>>>> Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large 
>>>>>> number
>>>>>> of documents
>>>>>>
>>>>>>
>>>>>> Hi Pete,
>>>>>> good hint, but we actually do have physical memory of  4Gb on the 
>>>>>> system. But then: we also have experienced that the gc of ibm 
>>>>>> jdk1.3.1 that we use is sometimes
>>>>>> behaving strangely with too large heap space anyway. (Limit seems 
>>>>>> to be 1.2 Gb)
>>>>>> I can say that gc is not collecting these objects since I  forced 
>>>>>> gc runs when indexing every now and then (when parsing pdf-type 
>>>>>> objects, that is): No effect.
>>>>>>
>>>>>> regards,
>>>>>>
>>>>>> Daniel
>>>>>>
>>>>>>
>>>>>> Pete Lewis wrote:
>>>>>>
>>>>>>  
>>>>>>
>>>>>>> Hi all
>>>>>>>
>>>>>>> Reading the thread with interest, there is another way I've 
>>>>>>> come     
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> across out
>>>>>>  
>>>>>>
>>>>>>> of memory errors when indexing large batches of documents.
>>>>>>>
>>>>>>> If you have your heap space settings too high, then you get     
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> swapping (which
>>>>>>  
>>>>>>
>>>>>>> impacts performance) plus you never reach the trigger for garbage
>>>>>>> collection, hence you don't garbage collect and hence you run 
>>>>>>> out     
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> of memory.
>>>>>>  
>>>>>>
>>>>>>> Can you check whether or not your garbage collection is being 
>>>>>>> triggered?
>>>>>>>
>>>>>>> Anomalously therefore if this is the case, by reducing the heap 
>>>>>>> space you
>>>>>>> can improve performance get rid of the out of memory errors.
>>>>>>>
>>>>>>> Cheers
>>>>>>> Pete Lewis
>>>>>>>
>>>>>>> ----- Original Message ----- From: "Daniel Taurat" 
>>>>>>> <da...@gaussvip.com>
>>>>>>> To: "Lucene Users List" <lu...@jakarta.apache.org>
>>>>>>> Sent: Friday, September 10, 2004 1:10 PM
>>>>>>> Subject: Re: Out of memory in lucene 1.4.1 when re-indexing 
>>>>>>> large     
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> number of
>>>>>>  
>>>>>>
>>>>>>> documents
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  
>>>>>>>
>>>>>>>> Daniel Aber schrieb:
>>>>>>>>
>>>>>>>>  
>>>>>>>>    
>>>>>>>>
>>>>>>>>> On Thursday 09 September 2004 19:47, Daniel Taurat wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>         
>>>>>>>>>
>>>>>>>>>> I am facing an out of memory problem using  Lucene 1.4.1.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Could you try with a recent CVS version? There has been a fix 
>>>>>>>>>         
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>> about files
>>>>>>  
>>>>>>
>>>>>>>>> not being deleted after 1.4.1. Not sure if that could cause 
>>>>>>>>> the problems
>>>>>>>>> you're experiencing.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Daniel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>            
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Well, it seems not to be files, it looks more like those 
>>>>>>>> SegmentTermEnum
>>>>>>>> objects accumulating in memory.
>>>>>>>> #I've seen some discussion on these objects in the 
>>>>>>>> developer-newsgroup
>>>>>>>> that had taken place some time ago.
>>>>>>>> I am afraid this is some kind of runaway caching I have to deal 
>>>>>>>> with.
>>>>>>>> Maybe not  correctly addressed in this newsgroup, after all...
>>>>>>>>
>>>>>>>> Anyway: any idea if there is an API command to re-init caches?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Daniel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------- 
>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>>>>> For additional commands, e-mail: 
>>>>>>>> lucene-user-help@jakarta.apache.org
>>>>>>>>
>>>>>>>>  
>>>>>>>>       
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------- 
>>>>>>>
>>>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>>>> For additional commands, e-mail: 
>>>>>>> lucene-user-help@jakarta.apache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>     
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --------------------------------------------------------------------- 
>>>>>>
>>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>>>
>>>>>>
>>>>>>   
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>>
>>>>>
>>>>>  
>>>>>
>>>>
>>>>
>>>
>>>
>> ------------------------------------------------------------------------
>>
>> Index: src/java/org/apache/lucene/index/TermInfosReader.java
>> ===================================================================
>> RCS file: 
>> /home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/TermInfosReader.java,v 
>>
>> retrieving revision 1.9
>> diff -u -r1.9 TermInfosReader.java
>> --- src/java/org/apache/lucene/index/TermInfosReader.java    6 Aug 
>> 2004 20:50:29 -0000    1.9
>> +++ src/java/org/apache/lucene/index/TermInfosReader.java    10 Sep 
>> 2004 17:46:47 -0000
>> @@ -45,6 +45,11 @@
>>     readIndex();
>>   }
>>
>> +  protected final void finalize() {
>> +    // patch for pre-1.4.2 JVMs, whose ThreadLocals leak
>> +    enumerators.set(null);
>> +  }
>> +
>>   public int getSkipInterval() {
>>     return origEnum.skipInterval;
>>   }
>>
>>  
>>
>> ------------------------------------------------------------------------
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Out of memory in lucene 1.4.1 when re-indexing large number of documents

Posted by Daniel Taurat <da...@gaussvip.com>.

Hi Doug,
you are absolutely right about the older version of the JDK: it is 1.3.1 
(ibm).
Unfortunately we cannot upgrade since we are bound to IBM Portalserver 4 
environment.
Results:
I patched the Lucene1.4.1:
it has improved not much: after indexing 1897 Objects  the number of 
SegmentTermEnum is up to 17936.
To be realistic: This is even a deterioration :(((
My next check will be with a JDK1.4.2 for the test environment, but this 
can only be a reference run for now.

Thanks,
Daniel

Doug Cutting wrote:

> It sounds like the ThreadLocal in TermInfosReader is not getting 
> correctly garbage collected when the TermInfosReader is collected. 
> Researching a bit, this was a bug in JVMs prior to 1.4.2, so my guess 
> is that you're running in an older JVM.  Is that right?

>
> I've attached a patch which should fix this.  Please tell me if it 
> works for you.
>
> Doug
>
> Daniel Taurat wrote:
>
>> Okay, that (1.4rc3)worked fine, too!
>> Got only 257 SegmentTermEnums for 1900 objects.
>>
>> Now I will go for the final test on the production server with the 
>> 1.4rc3 version  and about 40.000 objects.
>>
>> Daniel
>>
>> Daniel Taurat schrieb:
>>
>>> Hi all,
>>> here is some update for you:
>>> I switched back to Lucene 1.3-final and now the  number of the  
>>> SegmentTermEnum objects is controlled by gc again:
>>> it goes up to about 1000 and then it is down again to 254 after 
>>> indexing my 1900 test-objects.
>>> Stay tuned, I will try 1.4RC3 now, the last version before 
>>> FieldCache was introduced...
>>>
>>> Daniel
>>>
>>>
>>> Rupinder Singh Mazara schrieb:
>>>
>>>> hi all
>>>>  I had a similar problem, i have  database of documents with 24 
>>>> fields, and a average content of 7K, with  16M+ records
>>>>
>>>>  i had to split the jobs into slabs of 1M each and merging the 
>>>> resulting indexes, submissions to our job queue looked like
>>>>
>>>>  java -Xms100M -Xcompactexplicitgc -cp $CLASSPATH lucene.Indexer 22
>>>>  
>>>> and i still had outofmemory exception , the solution that i created 
>>>> was to after every 200K, documents create a temp directory, and 
>>>> merge them together, this was done to do the first production run, 
>>>> updates are now being handled incrementally
>>>>
>>>>  
>>>>
>>>> Exception in thread "main" java.lang.OutOfMemoryError
>>>> at 
>>>> org.apache.lucene.store.RAMOutputStream.flushBuffer(RAMOutputStream.java(Compiled 
>>>> Code))
>>>>     at 
>>>> org.apache.lucene.store.OutputStream.flush(OutputStream.java(Inlined 
>>>> Compiled Code))
>>>>     at 
>>>> org.apache.lucene.store.OutputStream.writeByte(OutputStream.java(Inlined 
>>>> Compiled Code))
>>>>     at 
>>>> org.apache.lucene.store.OutputStream.writeBytes(OutputStream.java(Compiled 
>>>> Code))
>>>>     at 
>>>> org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java(Compiled 
>>>> Code))
>>>>     at 
>>>> org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java(Compiled 
>>>> Code))
>>>>     at 
>>>> org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java(Compiled 
>>>> Code))
>>>>     at 
>>>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java(Compiled 
>>>> Code))
>>>>     at 
>>>> org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java(Compiled 
>>>> Code))
>>>>     at 
>>>> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:366)
>>>>     at lucene.Indexer.doIndex(CDBIndexer.java(Compiled Code))
>>>>     at lucene.Indexer.main(CDBIndexer.java:168)
>>>>
>>>>  
>>>>
>>>>> -----Original Message-----
>>>>> From: Daniel Taurat [mailto:daniel.taurat@gaussvip.com]
>>>>> Sent: 10 September 2004 14:42
>>>>> To: Lucene Users List
>>>>> Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large 
>>>>> number
>>>>> of documents
>>>>>
>>>>>
>>>>> Hi Pete,
>>>>> good hint, but we actually do have physical memory of  4Gb on the 
>>>>> system. But then: we also have experienced that the gc of ibm 
>>>>> jdk1.3.1 that we use is sometimes
>>>>> behaving strangely with too large heap space anyway. (Limit seems 
>>>>> to be 1.2 Gb)
>>>>> I can say that gc is not collecting these objects since I  forced 
>>>>> gc runs when indexing every now and then (when parsing pdf-type 
>>>>> objects, that is): No effect.
>>>>>
>>>>> regards,
>>>>>
>>>>> Daniel
>>>>>
>>>>>
>>>>> Pete Lewis wrote:
>>>>>
>>>>>  
>>>>>
>>>>>> Hi all
>>>>>>
>>>>>> Reading the thread with interest, there is another way I've come     
>>>>>
>>>>>
>>>>>
>>>>> across out
>>>>>  
>>>>>
>>>>>> of memory errors when indexing large batches of documents.
>>>>>>
>>>>>> If you have your heap space settings too high, then you get     
>>>>>
>>>>>
>>>>>
>>>>> swapping (which
>>>>>  
>>>>>
>>>>>> impacts performance) plus you never reach the trigger for garbage
>>>>>> collection, hence you don't garbage collect and hence you run 
>>>>>> out     
>>>>>
>>>>>
>>>>>
>>>>> of memory.
>>>>>  
>>>>>
>>>>>> Can you check whether or not your garbage collection is being 
>>>>>> triggered?
>>>>>>
>>>>>> Anomalously therefore if this is the case, by reducing the heap 
>>>>>> space you
>>>>>> can improve performance get rid of the out of memory errors.
>>>>>>
>>>>>> Cheers
>>>>>> Pete Lewis
>>>>>>
>>>>>> ----- Original Message ----- From: "Daniel Taurat" 
>>>>>> <da...@gaussvip.com>
>>>>>> To: "Lucene Users List" <lu...@jakarta.apache.org>
>>>>>> Sent: Friday, September 10, 2004 1:10 PM
>>>>>> Subject: Re: Out of memory in lucene 1.4.1 when re-indexing 
>>>>>> large     
>>>>>
>>>>>
>>>>>
>>>>> number of
>>>>>  
>>>>>
>>>>>> documents
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>   
>>>>>>
>>>>>>> Daniel Aber schrieb:
>>>>>>>
>>>>>>>  
>>>>>>>     
>>>>>>>
>>>>>>>> On Thursday 09 September 2004 19:47, Daniel Taurat wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>          
>>>>>>>>
>>>>>>>>> I am facing an out of memory problem using  Lucene 1.4.1.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Could you try with a recent CVS version? There has been a fix 
>>>>>>>>         
>>>>>>>
>>>>>>>
>>>>>>>
>>>>> about files
>>>>>  
>>>>>
>>>>>>>> not being deleted after 1.4.1. Not sure if that could cause the 
>>>>>>>> problems
>>>>>>>> you're experiencing.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Daniel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>            
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Well, it seems not to be files, it looks more like those 
>>>>>>> SegmentTermEnum
>>>>>>> objects accumulating in memory.
>>>>>>> #I've seen some discussion on these objects in the 
>>>>>>> developer-newsgroup
>>>>>>> that had taken place some time ago.
>>>>>>> I am afraid this is some kind of runaway caching I have to deal 
>>>>>>> with.
>>>>>>> Maybe not  correctly addressed in this newsgroup, after all...
>>>>>>>
>>>>>>> Anyway: any idea if there is an API command to re-init caches?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Daniel
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------- 
>>>>>>>
>>>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>>>> For additional commands, e-mail: 
>>>>>>> lucene-user-help@jakarta.apache.org
>>>>>>>
>>>>>>>  
>>>>>>>       
>>>>>>
>>>>>>
>>>>>>
>>>>>> --------------------------------------------------------------------- 
>>>>>>
>>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>     
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>>
>>>>>
>>>>>   
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>
>>>>
>>>>  
>>>>
>>>
>>>
>>
>>
>------------------------------------------------------------------------
>
>Index: src/java/org/apache/lucene/index/TermInfosReader.java
>===================================================================
>RCS file: /home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/TermInfosReader.java,v
>retrieving revision 1.9
>diff -u -r1.9 TermInfosReader.java
>--- src/java/org/apache/lucene/index/TermInfosReader.java	6 Aug 2004 20:50:29 -0000	1.9
>+++ src/java/org/apache/lucene/index/TermInfosReader.java	10 Sep 2004 17:46:47 -0000
>@@ -45,6 +45,11 @@
>     readIndex();
>   }
> 
>+  protected final void finalize() {
>+    // patch for pre-1.4.2 JVMs, whose ThreadLocals leak
>+    enumerators.set(null);
>+  }
>+
>   public int getSkipInterval() {
>     return origEnum.skipInterval;
>   }
>
>  
>
>------------------------------------------------------------------------
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Out of memory in lucene 1.4.1 when re-indexing large number of documents

Posted by Doug Cutting <cu...@apache.org>.

It sounds like the ThreadLocal in TermInfosReader is not getting 
correctly garbage collected when the TermInfosReader is collected. 
Researching a bit, this was a bug in JVMs prior to 1.4.2, so my guess is 
that you're running in an older JVM.  Is that right?

I've attached a patch which should fix this.  Please tell me if it works 
for you.

Doug

Daniel Taurat wrote:
> Okay, that (1.4rc3)worked fine, too!
> Got only 257 SegmentTermEnums for 1900 objects.
> 
> Now I will go for the final test on the production server with the 
> 1.4rc3 version  and about 40.000 objects.
> 
> Daniel
> 
> Daniel Taurat schrieb:
> 
>> Hi all,
>> here is some update for you:
>> I switched back to Lucene 1.3-final and now the  number of the  
>> SegmentTermEnum objects is controlled by gc again:
>> it goes up to about 1000 and then it is down again to 254 after 
>> indexing my 1900 test-objects.
>> Stay tuned, I will try 1.4RC3 now, the last version before FieldCache 
>> was introduced...
>>
>> Daniel
>>
>>
>> Rupinder Singh Mazara schrieb:
>>
>>> hi all
>>>  I had a similar problem, i have  database of documents with 24 
>>> fields, and a average content of 7K, with  16M+ records
>>>
>>>  i had to split the jobs into slabs of 1M each and merging the 
>>> resulting indexes, submissions to our job queue looked like
>>>
>>>  java -Xms100M -Xcompactexplicitgc -cp $CLASSPATH lucene.Indexer 22
>>>  
>>> and i still had outofmemory exception , the solution that i created 
>>> was to after every 200K, documents create a temp directory, and merge 
>>> them together, this was done to do the first production run, updates 
>>> are now being handled incrementally
>>>
>>>  
>>>
>>> Exception in thread "main" java.lang.OutOfMemoryError
>>> at 
>>> org.apache.lucene.store.RAMOutputStream.flushBuffer(RAMOutputStream.java(Compiled 
>>> Code))
>>>     at 
>>> org.apache.lucene.store.OutputStream.flush(OutputStream.java(Inlined 
>>> Compiled Code))
>>>     at 
>>> org.apache.lucene.store.OutputStream.writeByte(OutputStream.java(Inlined 
>>> Compiled Code))
>>>     at 
>>> org.apache.lucene.store.OutputStream.writeBytes(OutputStream.java(Compiled 
>>> Code))
>>>     at 
>>> org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java(Compiled 
>>> Code))
>>>     at 
>>> org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java(Compiled 
>>> Code))
>>>     at 
>>> org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java(Compiled 
>>> Code))
>>>     at 
>>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java(Compiled 
>>> Code))
>>>     at 
>>> org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java(Compiled 
>>> Code))
>>>     at 
>>> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:366)
>>>     at lucene.Indexer.doIndex(CDBIndexer.java(Compiled Code))
>>>     at lucene.Indexer.main(CDBIndexer.java:168)
>>>
>>>  
>>>
>>>> -----Original Message-----
>>>> From: Daniel Taurat [mailto:daniel.taurat@gaussvip.com]
>>>> Sent: 10 September 2004 14:42
>>>> To: Lucene Users List
>>>> Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large 
>>>> number
>>>> of documents
>>>>
>>>>
>>>> Hi Pete,
>>>> good hint, but we actually do have physical memory of  4Gb on the 
>>>> system. But then: we also have experienced that the gc of ibm 
>>>> jdk1.3.1 that we use is sometimes
>>>> behaving strangely with too large heap space anyway. (Limit seems to 
>>>> be 1.2 Gb)
>>>> I can say that gc is not collecting these objects since I  forced gc 
>>>> runs when indexing every now and then (when parsing pdf-type 
>>>> objects, that is): No effect.
>>>>
>>>> regards,
>>>>
>>>> Daniel
>>>>
>>>>
>>>> Pete Lewis wrote:
>>>>
>>>>  
>>>>
>>>>> Hi all
>>>>>
>>>>> Reading the thread with interest, there is another way I've come     
>>>>
>>>>
>>>> across out
>>>>  
>>>>
>>>>> of memory errors when indexing large batches of documents.
>>>>>
>>>>> If you have your heap space settings too high, then you get     
>>>>
>>>>
>>>> swapping (which
>>>>  
>>>>
>>>>> impacts performance) plus you never reach the trigger for garbage
>>>>> collection, hence you don't garbage collect and hence you run out     
>>>>
>>>>
>>>> of memory.
>>>>  
>>>>
>>>>> Can you check whether or not your garbage collection is being 
>>>>> triggered?
>>>>>
>>>>> Anomalously therefore if this is the case, by reducing the heap 
>>>>> space you
>>>>> can improve performance get rid of the out of memory errors.
>>>>>
>>>>> Cheers
>>>>> Pete Lewis
>>>>>
>>>>> ----- Original Message ----- From: "Daniel Taurat" 
>>>>> <da...@gaussvip.com>
>>>>> To: "Lucene Users List" <lu...@jakarta.apache.org>
>>>>> Sent: Friday, September 10, 2004 1:10 PM
>>>>> Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large     
>>>>
>>>>
>>>> number of
>>>>  
>>>>
>>>>> documents
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>    
>>>>>
>>>>>> Daniel Aber schrieb:
>>>>>>
>>>>>>  
>>>>>>      
>>>>>>
>>>>>>> On Thursday 09 September 2004 19:47, Daniel Taurat wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>> I am facing an out of memory problem using  Lucene 1.4.1.
>>>>>>>>
>>>>>>>>
>>>>>>>>                
>>>>>>>
>>>>>>>
>>>>>>> Could you try with a recent CVS version? There has been a fix 
>>>>>>>         
>>>>>>
>>>>>>
>>>> about files
>>>>  
>>>>
>>>>>>> not being deleted after 1.4.1. Not sure if that could cause the 
>>>>>>> problems
>>>>>>> you're experiencing.
>>>>>>>
>>>>>>> Regards
>>>>>>> Daniel
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>            
>>>>>>
>>>>>>
>>>>>> Well, it seems not to be files, it looks more like those 
>>>>>> SegmentTermEnum
>>>>>> objects accumulating in memory.
>>>>>> #I've seen some discussion on these objects in the 
>>>>>> developer-newsgroup
>>>>>> that had taken place some time ago.
>>>>>> I am afraid this is some kind of runaway caching I have to deal with.
>>>>>> Maybe not  correctly addressed in this newsgroup, after all...
>>>>>>
>>>>>> Anyway: any idea if there is an API command to re-init caches?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Daniel
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>>>
>>>>>>  
>>>>>>       
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>>
>>>>>
>>>>>
>>>>>     
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>
>>>>
>>>>   
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>>
>>>  
>>>
>>
>>
> 
>

Re: Out of memory in lucene 1.4.1 when re-indexing large number of documents

Posted by Daniel Taurat <da...@gaussvip.com>.

Okay, that (1.4rc3)worked fine, too!
Got only 257 SegmentTermEnums for 1900 objects.

Now I will go for the final test on the production server with the 
1.4rc3 version  and about 40.000 objects.

Daniel

Daniel Taurat schrieb:

> Hi all,
> here is some update for you:
> I switched back to Lucene 1.3-final and now the  number of the  
> SegmentTermEnum objects is controlled by gc again:
> it goes up to about 1000 and then it is down again to 254 after 
> indexing my 1900 test-objects.
> Stay tuned, I will try 1.4RC3 now, the last version before FieldCache 
> was introduced...
>
> Daniel
>
>
> Rupinder Singh Mazara schrieb:
>
>> hi all
>>  I had a similar problem, i have  database of documents with 24 
>> fields, and a average content of 7K, with  16M+ records
>>
>>  i had to split the jobs into slabs of 1M each and merging the 
>> resulting indexes, submissions to our job queue looked like
>>
>>  java -Xms100M -Xcompactexplicitgc -cp $CLASSPATH lucene.Indexer 22
>>  
>> and i still had outofmemory exception , the solution that i created 
>> was to after every 200K, documents create a temp directory, and merge 
>> them together, this was done to do the first production run, updates 
>> are now being handled incrementally
>>
>>  
>>
>> Exception in thread "main" java.lang.OutOfMemoryError
>> at 
>> org.apache.lucene.store.RAMOutputStream.flushBuffer(RAMOutputStream.java(Compiled 
>> Code))
>>     at 
>> org.apache.lucene.store.OutputStream.flush(OutputStream.java(Inlined 
>> Compiled Code))
>>     at 
>> org.apache.lucene.store.OutputStream.writeByte(OutputStream.java(Inlined 
>> Compiled Code))
>>     at 
>> org.apache.lucene.store.OutputStream.writeBytes(OutputStream.java(Compiled 
>> Code))
>>     at 
>> org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java(Compiled 
>> Code))
>>     at 
>> org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java(Compiled 
>> Code))
>>     at 
>> org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java(Compiled 
>> Code))
>>     at 
>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java(Compiled 
>> Code))
>>     at 
>> org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java(Compiled 
>> Code))
>>     at 
>> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:366)
>>     at lucene.Indexer.doIndex(CDBIndexer.java(Compiled Code))
>>     at lucene.Indexer.main(CDBIndexer.java:168)
>>
>>  
>>
>>> -----Original Message-----
>>> From: Daniel Taurat [mailto:daniel.taurat@gaussvip.com]
>>> Sent: 10 September 2004 14:42
>>> To: Lucene Users List
>>> Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large 
>>> number
>>> of documents
>>>
>>>
>>> Hi Pete,
>>> good hint, but we actually do have physical memory of  4Gb on the 
>>> system. But then: we also have experienced that the gc of ibm 
>>> jdk1.3.1 that we use is sometimes
>>> behaving strangely with too large heap space anyway. (Limit seems to 
>>> be 1.2 Gb)
>>> I can say that gc is not collecting these objects since I  forced gc 
>>> runs when indexing every now and then (when parsing pdf-type 
>>> objects, that is): No effect.
>>>
>>> regards,
>>>
>>> Daniel
>>>
>>>
>>> Pete Lewis wrote:
>>>
>>>   
>>>
>>>> Hi all
>>>>
>>>> Reading the thread with interest, there is another way I've come     
>>>
>>> across out
>>>   
>>>
>>>> of memory errors when indexing large batches of documents.
>>>>
>>>> If you have your heap space settings too high, then you get     
>>>
>>> swapping (which
>>>   
>>>
>>>> impacts performance) plus you never reach the trigger for garbage
>>>> collection, hence you don't garbage collect and hence you run out     
>>>
>>> of memory.
>>>   
>>>
>>>> Can you check whether or not your garbage collection is being 
>>>> triggered?
>>>>
>>>> Anomalously therefore if this is the case, by reducing the heap 
>>>> space you
>>>> can improve performance get rid of the out of memory errors.
>>>>
>>>> Cheers
>>>> Pete Lewis
>>>>
>>>> ----- Original Message ----- From: "Daniel Taurat" 
>>>> <da...@gaussvip.com>
>>>> To: "Lucene Users List" <lu...@jakarta.apache.org>
>>>> Sent: Friday, September 10, 2004 1:10 PM
>>>> Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large     
>>>
>>> number of
>>>   
>>>
>>>> documents
>>>>
>>>>
>>>>
>>>>
>>>>     
>>>>
>>>>> Daniel Aber schrieb:
>>>>>
>>>>>  
>>>>>       
>>>>>
>>>>>> On Thursday 09 September 2004 19:47, Daniel Taurat wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>    
>>>>>>         
>>>>>>
>>>>>>> I am facing an out of memory problem using  Lucene 1.4.1.
>>>>>>>
>>>>>>>
>>>>>>>      
>>>>>>>           
>>>>>>
>>>>>> Could you try with a recent CVS version? There has been a fix 
>>>>>>         
>>>>>
>>> about files
>>>   
>>>
>>>>>> not being deleted after 1.4.1. Not sure if that could cause the 
>>>>>> problems
>>>>>> you're experiencing.
>>>>>>
>>>>>> Regards
>>>>>> Daniel
>>>>>>
>>>>>>
>>>>>>
>>>>>>    
>>>>>>         
>>>>>
>>>>> Well, it seems not to be files, it looks more like those 
>>>>> SegmentTermEnum
>>>>> objects accumulating in memory.
>>>>> #I've seen some discussion on these objects in the 
>>>>> developer-newsgroup
>>>>> that had taken place some time ago.
>>>>> I am afraid this is some kind of runaway caching I have to deal with.
>>>>> Maybe not  correctly addressed in this newsgroup, after all...
>>>>>
>>>>> Anyway: any idea if there is an API command to re-init caches?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Daniel
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>>
>>>>>  
>>>>>       
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>
>>>>
>>>>
>>>>     
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>>
>>>   
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>>  
>>
>
>


-- 
Mit freundlichen Grüßen

    Dr. Daniel Taurat

    Senior Consultant
-- 
VIP ENTERPRISE 8 | THE POWER OF CONTENT AT WORK
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Gauss Interprise AG      Phone:  +49-40-3250-1508
Weidestr. 120 a          Mobile: +49-173-2418472
D- 22083 Hamburg         Fax:    +49-40-3250-191508
Germany                  E-Mail: daniel.taurat@gaussvip.com
                         Web:    http://www.gaussvip.com
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Out of memory in lucene 1.4.1 when re-indexing large number of documents

Posted by Daniel Taurat <da...@gaussvip.com>.

Hi all,
here is some update for you:
I switched back to Lucene 1.3-final and now the  number of the  
SegmentTermEnum objects is controlled by gc again:
it goes up to about 1000 and then it is down again to 254 after indexing 
my 1900 test-objects.
Stay tuned, I will try 1.4RC3 now, the last version before FieldCache 
was introduced...

Daniel


Rupinder Singh Mazara schrieb:

>hi all 
>
>  I had a similar problem, i have  database of documents with 24 fields, and a average content of 7K, with  16M+ records
>
>  i had to split the jobs into slabs of 1M each and merging the resulting indexes, submissions to our job queue looked like
> 
>  java -Xms100M -Xcompactexplicitgc -cp $CLASSPATH lucene.Indexer 22
>  
> and i still had outofmemory exception , the solution that i created was to after every 200K, documents create a temp directory, and merge them together, this was done to do the first production run, updates are now being handled incrementally
> 
>  
>
>Exception in thread "main" java.lang.OutOfMemoryError
>at org.apache.lucene.store.RAMOutputStream.flushBuffer(RAMOutputStream.java(Compiled Code))
>	at org.apache.lucene.store.OutputStream.flush(OutputStream.java(Inlined Compiled Code))
>	at org.apache.lucene.store.OutputStream.writeByte(OutputStream.java(Inlined Compiled Code))
>	at org.apache.lucene.store.OutputStream.writeBytes(OutputStream.java(Compiled Code))
>	at org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java(Compiled Code))
>	at org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java(Compiled Code))
>	at org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java(Compiled Code))
>	at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java(Compiled Code))
>	at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java(Compiled Code))
>	at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:366)
>	at lucene.Indexer.doIndex(CDBIndexer.java(Compiled Code))
>	at lucene.Indexer.main(CDBIndexer.java:168)
>
>  
>
>>-----Original Message-----
>>From: Daniel Taurat [mailto:daniel.taurat@gaussvip.com]
>>Sent: 10 September 2004 14:42
>>To: Lucene Users List
>>Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large number
>>of documents
>>
>>
>>Hi Pete,
>>good hint, but we actually do have physical memory of  4Gb on the 
>>system. But then: we also have experienced that the gc of ibm jdk1.3.1 
>>that we use is sometimes
>>behaving strangely with too large heap space anyway. (Limit seems to be 
>>1.2 Gb)
>>I can say that gc is not collecting these objects since I  forced gc 
>>runs when indexing every now and then (when parsing pdf-type objects, 
>>that is): No effect.
>>
>>regards,
>>
>>Daniel
>>
>>
>>Pete Lewis wrote:
>>
>>    
>>
>>>Hi all
>>>
>>>Reading the thread with interest, there is another way I've come 
>>>      
>>>
>>across out
>>    
>>
>>>of memory errors when indexing large batches of documents.
>>>
>>>If you have your heap space settings too high, then you get 
>>>      
>>>
>>swapping (which
>>    
>>
>>>impacts performance) plus you never reach the trigger for garbage
>>>collection, hence you don't garbage collect and hence you run out 
>>>      
>>>
>>of memory.
>>    
>>
>>>Can you check whether or not your garbage collection is being triggered?
>>>
>>>Anomalously therefore if this is the case, by reducing the heap space you
>>>can improve performance get rid of the out of memory errors.
>>>
>>>Cheers
>>>Pete Lewis
>>>
>>>----- Original Message ----- 
>>>From: "Daniel Taurat" <da...@gaussvip.com>
>>>To: "Lucene Users List" <lu...@jakarta.apache.org>
>>>Sent: Friday, September 10, 2004 1:10 PM
>>>Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large 
>>>      
>>>
>>number of
>>    
>>
>>>documents
>>>
>>>
>>> 
>>>
>>>      
>>>
>>>>Daniel Aber schrieb:
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>On Thursday 09 September 2004 19:47, Daniel Taurat wrote:
>>>>>
>>>>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>I am facing an out of memory problem using  Lucene 1.4.1.
>>>>>>
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>Could you try with a recent CVS version? There has been a fix 
>>>>>          
>>>>>
>>about files
>>    
>>
>>>>>not being deleted after 1.4.1. Not sure if that could cause the problems
>>>>>you're experiencing.
>>>>>
>>>>>Regards
>>>>>Daniel
>>>>>
>>>>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>Well, it seems not to be files, it looks more like those SegmentTermEnum
>>>>objects accumulating in memory.
>>>>#I've seen some discussion on these objects in the developer-newsgroup
>>>>that had taken place some time ago.
>>>>I am afraid this is some kind of runaway caching I have to deal with.
>>>>Maybe not  correctly addressed in this newsgroup, after all...
>>>>
>>>>Anyway: any idea if there is an API command to re-init caches?
>>>>
>>>>Thanks,
>>>>
>>>>Daniel
>>>>
>>>>
>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>> 
>>>
>>>      
>>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>  
>


-- 
Mit freundlichen Grüßen

    Dr. Daniel Taurat

    Senior Consultant
-- 
VIP ENTERPRISE 8 | THE POWER OF CONTENT AT WORK
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Gauss Interprise AG      Phone:  +49-40-3250-1508
Weidestr. 120 a          Mobile: +49-173-2418472
D- 22083 Hamburg         Fax:    +49-40-3250-191508
Germany                  E-Mail: daniel.taurat@gaussvip.com
                         Web:    http://www.gaussvip.com
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: Out of memory in lucene 1.4.1 when re-indexing large number of documents

Posted by Rupinder Singh Mazara <rs...@ebi.ac.uk>.


hi all 

  I had a similar problem, i have  database of documents with 24 fields, and a average content of 7K, with  16M+ records

  i had to split the jobs into slabs of 1M each and merging the resulting indexes, submissions to our job queue looked like
 
  java -Xms100M -Xcompactexplicitgc -cp $CLASSPATH lucene.Indexer 22
  
 and i still had outofmemory exception , the solution that i created was to after every 200K, documents create a temp directory, and merge them together, this was done to do the first production run, updates are now being handled incrementally
 
  

Exception in thread "main" java.lang.OutOfMemoryError
at org.apache.lucene.store.RAMOutputStream.flushBuffer(RAMOutputStream.java(Compiled Code))
	at org.apache.lucene.store.OutputStream.flush(OutputStream.java(Inlined Compiled Code))
	at org.apache.lucene.store.OutputStream.writeByte(OutputStream.java(Inlined Compiled Code))
	at org.apache.lucene.store.OutputStream.writeBytes(OutputStream.java(Compiled Code))
	at org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java(Compiled Code))
	at org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java(Compiled Code))
	at org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java(Compiled Code))
	at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java(Compiled Code))
	at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java(Compiled Code))
	at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:366)
	at lucene.Indexer.doIndex(CDBIndexer.java(Compiled Code))
	at lucene.Indexer.main(CDBIndexer.java:168)

>-----Original Message-----
>From: Daniel Taurat [mailto:daniel.taurat@gaussvip.com]
>Sent: 10 September 2004 14:42
>To: Lucene Users List
>Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large number
>of documents
>
>
>Hi Pete,
>good hint, but we actually do have physical memory of  4Gb on the 
>system. But then: we also have experienced that the gc of ibm jdk1.3.1 
>that we use is sometimes
>behaving strangely with too large heap space anyway. (Limit seems to be 
>1.2 Gb)
>I can say that gc is not collecting these objects since I  forced gc 
>runs when indexing every now and then (when parsing pdf-type objects, 
>that is): No effect.
>
>regards,
>
>Daniel
>
>
>Pete Lewis wrote:
>
>>Hi all
>>
>>Reading the thread with interest, there is another way I've come 
>across out
>>of memory errors when indexing large batches of documents.
>>
>>If you have your heap space settings too high, then you get 
>swapping (which
>>impacts performance) plus you never reach the trigger for garbage
>>collection, hence you don't garbage collect and hence you run out 
>of memory.
>>
>>Can you check whether or not your garbage collection is being triggered?
>>
>>Anomalously therefore if this is the case, by reducing the heap space you
>>can improve performance get rid of the out of memory errors.
>>
>>Cheers
>>Pete Lewis
>>
>>----- Original Message ----- 
>>From: "Daniel Taurat" <da...@gaussvip.com>
>>To: "Lucene Users List" <lu...@jakarta.apache.org>
>>Sent: Friday, September 10, 2004 1:10 PM
>>Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large 
>number of
>>documents
>>
>>
>>  
>>
>>>Daniel Aber schrieb:
>>>
>>>    
>>>
>>>>On Thursday 09 September 2004 19:47, Daniel Taurat wrote:
>>>>
>>>>
>>>>
>>>>      
>>>>
>>>>>I am facing an out of memory problem using  Lucene 1.4.1.
>>>>>
>>>>>
>>>>>        
>>>>>
>>>>Could you try with a recent CVS version? There has been a fix 
>about files
>>>>not being deleted after 1.4.1. Not sure if that could cause the problems
>>>>you're experiencing.
>>>>
>>>>Regards
>>>>Daniel
>>>>
>>>>
>>>>
>>>>      
>>>>
>>>Well, it seems not to be files, it looks more like those SegmentTermEnum
>>>objects accumulating in memory.
>>>#I've seen some discussion on these objects in the developer-newsgroup
>>>that had taken place some time ago.
>>>I am afraid this is some kind of runaway caching I have to deal with.
>>>Maybe not  correctly addressed in this newsgroup, after all...
>>>
>>>Anyway: any idea if there is an API command to re-init caches?
>>>
>>>Thanks,
>>>
>>>Daniel
>>>
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>>    
>>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>  
>>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Out of memory in lucene 1.4.1 when re-indexing large number of documents

Posted by Daniel Taurat <da...@gaussvip.com>.

Hi Pete,
good hint, but we actually do have physical memory of  4Gb on the 
system. But then: we also have experienced that the gc of ibm jdk1.3.1 
that we use is sometimes
behaving strangely with too large heap space anyway. (Limit seems to be 
1.2 Gb)
I can say that gc is not collecting these objects since I  forced gc 
runs when indexing every now and then (when parsing pdf-type objects, 
that is): No effect.

regards,

Daniel


Pete Lewis wrote:

>Hi all
>
>Reading the thread with interest, there is another way I've come across out
>of memory errors when indexing large batches of documents.
>
>If you have your heap space settings too high, then you get swapping (which
>impacts performance) plus you never reach the trigger for garbage
>collection, hence you don't garbage collect and hence you run out of memory.
>
>Can you check whether or not your garbage collection is being triggered?
>
>Anomalously therefore if this is the case, by reducing the heap space you
>can improve performance get rid of the out of memory errors.
>
>Cheers
>Pete Lewis
>
>----- Original Message ----- 
>From: "Daniel Taurat" <da...@gaussvip.com>
>To: "Lucene Users List" <lu...@jakarta.apache.org>
>Sent: Friday, September 10, 2004 1:10 PM
>Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large number of
>documents
>
>
>  
>
>>Daniel Aber schrieb:
>>
>>    
>>
>>>On Thursday 09 September 2004 19:47, Daniel Taurat wrote:
>>>
>>>
>>>
>>>      
>>>
>>>>I am facing an out of memory problem using  Lucene 1.4.1.
>>>>
>>>>
>>>>        
>>>>
>>>Could you try with a recent CVS version? There has been a fix about files
>>>not being deleted after 1.4.1. Not sure if that could cause the problems
>>>you're experiencing.
>>>
>>>Regards
>>>Daniel
>>>
>>>
>>>
>>>      
>>>
>>Well, it seems not to be files, it looks more like those SegmentTermEnum
>>objects accumulating in memory.
>>#I've seen some discussion on these objects in the developer-newsgroup
>>that had taken place some time ago.
>>I am afraid this is some kind of runaway caching I have to deal with.
>>Maybe not  correctly addressed in this newsgroup, after all...
>>
>>Anyway: any idea if there is an API command to re-init caches?
>>
>>Thanks,
>>
>>Daniel
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>  
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Out of memory in lucene 1.4.1 when re-indexing large number of documents

Posted by Pete Lewis <pe...@uptima.co.uk>.

Hi all

Reading the thread with interest, there is another way I've come across out
of memory errors when indexing large batches of documents.

If you have your heap space settings too high, then you get swapping (which
impacts performance) plus you never reach the trigger for garbage
collection, hence you don't garbage collect and hence you run out of memory.

Can you check whether or not your garbage collection is being triggered?

Anomalously therefore if this is the case, by reducing the heap space you
can improve performance get rid of the out of memory errors.

Cheers
Pete Lewis

----- Original Message ----- 
From: "Daniel Taurat" <da...@gaussvip.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Friday, September 10, 2004 1:10 PM
Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large number of
documents


> Daniel Aber schrieb:
>
> >On Thursday 09 September 2004 19:47, Daniel Taurat wrote:
> >
> >
> >
> >>I am facing an out of memory problem using  Lucene 1.4.1.
> >>
> >>
> >
> >Could you try with a recent CVS version? There has been a fix about files
> >not being deleted after 1.4.1. Not sure if that could cause the problems
> >you're experiencing.
> >
> >Regards
> > Daniel
> >
> >
> >
> Well, it seems not to be files, it looks more like those SegmentTermEnum
> objects accumulating in memory.
> #I've seen some discussion on these objects in the developer-newsgroup
> that had taken place some time ago.
> I am afraid this is some kind of runaway caching I have to deal with.
> Maybe not  correctly addressed in this newsgroup, after all...
>
> Anyway: any idea if there is an API command to re-init caches?
>
> Thanks,
>
> Daniel
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Out of memory in lucene 1.4.1 when re-indexing large number of documents

Posted by Daniel Taurat <da...@gaussvip.com>.

Daniel Aber schrieb:

>On Thursday 09 September 2004 19:47, Daniel Taurat wrote:
>
>  
>
>>I am facing an out of memory problem using  Lucene 1.4.1.
>>    
>>
>
>Could you try with a recent CVS version? There has been a fix about files 
>not being deleted after 1.4.1. Not sure if that could cause the problems 
>you're experiencing.
>
>Regards
> Daniel
>
>  
>
Well, it seems not to be files, it looks more like those SegmentTermEnum 
objects accumulating in memory.
#I've seen some discussion on these objects in the developer-newsgroup 
that had taken place some time ago.
I am afraid this is some kind of runaway caching I have to deal with.
Maybe not  correctly addressed in this newsgroup, after all...

Anyway: any idea if there is an API command to re-init caches?

Thanks,

Daniel



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Out of memory in lucene 1.4.1 when re-indexing large number of documents

Posted by Daniel Naber <da...@t-online.de>.

On Thursday 09 September 2004 19:47, Daniel Taurat wrote:

> I am facing an out of memory problem using  Lucene 1.4.1.

Could you try with a recent CVS version? There has been a fix about files 
not being deleted after 1.4.1. Not sure if that could cause the problems 
you're experiencing.

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org