You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Tamara Bobic <ta...@scai.fraunhofer.de> on 2011/10/18 18:21:32 UTC

OutOfMemoryError

Hi all,

I am using Lucene to query Medline abstracts and as a result I get around 3 million hits. Each of the hits is processed and information from a certain field is used.

After certain number of hits, somewhere around 1 million (not always the same number) I get OutOfMemory exception that looks like this:

Exception in thread "main" java.lang.OutOfMemoryError
	at java.util.zip.Inflater.inflateBytes(Native Method)
	at java.util.zip.Inflater.inflate(Inflater.java:221)
	at java.util.zip.Inflater.inflate(Inflater.java:238)
	at org.apache.lucene.document.CompressionTools.decompress(CompressionTools.java:108)
	at org.apache.lucene.index.FieldsReader.uncompress(FieldsReader.java:609)
	at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:385)
	at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:231)
	at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:1013)
	at org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:520)
	at org.apache.lucene.index.FilterIndexReader.document(FilterIndexReader.java:149)
	at org.apache.lucene.index.IndexReader.document(IndexReader.java:947)
	at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:152)
	at org.apache.lucene.search.MultiSearcher.doc(MultiSearcher.java:156)
	at org.apache.lucene.search.Hits.doc(Hits.java:180)
	at de.fhg.scai.bio.tamara.corpusBuilding.LuceneCmdLineInterface.queryMedline(LuceneCmdLineInterface.java:178)
	at de.fhg.scai.bio.tamara.corpusBuilding.LuceneCmdLineInterface.main(LuceneCmdLineInterface.java:152)


this line which causes problems is:
String docText = hits.doc(j).getField("DOCUMENT").stringValue() ; 

I am using java 1.6 and I tried solving this issue with different garbage collectors (-XX:+UseParallelGC and -XX:+UseParallelOldGC) but it didn't help.

Does anyone have any idea how to solve this problem?

There is also an official bug report:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6293787

Help is much appreciated. :)

Best regards,
Tamara Bobic

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: OutOfMemoryError

Posted by Mead Lai <la...@gmail.com>.

Tamara,

You may use StringBuffer instead of String docText =
hits.doc(j).getField("DOCUMENT").stringValue() ;
after that you may use StringBuffer.delete() to release memery.
Another way is using x64-bit machine.

Regards,
Mead


On Wed, Oct 19, 2011 at 5:14 AM, Otis Gospodnetic <
otis_gospodnetic@yahoo.com> wrote:

> Bok Tamara,
>
> You didn't say what -Xmx value you are using.  Try a little higher value.
>  Note that loading field values (and it looks like this one may be big
> because is compressed) from a lot of hits is not recommended.
>
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
> >________________________________
> >From: Tamara Bobic <ta...@scai.fraunhofer.de>
> >To: java-user@lucene.apache.org
> >Cc: Roman Klinger <ro...@scai.fraunhofer.de>
> >Sent: Tuesday, October 18, 2011 12:21 PM
> >Subject: OutOfMemoryError
>  >
> >Hi all,
> >
> >I am using Lucene to query Medline abstracts and as a result I get around
> 3 million hits. Each of the hits is processed and information from a certain
> field is used.
> >
> >After certain number of hits, somewhere around 1 million (not always the
> same number) I get OutOfMemory exception that looks like this:
> >
> >Exception in thread "main" java.lang.OutOfMemoryError
> >    at java.util.zip.Inflater.inflateBytes(Native Method)
> >    at java.util.zip.Inflater.inflate(Inflater.java:221)
> >    at java.util.zip.Inflater.inflate(Inflater.java:238)
> >    at
> org.apache.lucene.document.CompressionTools.decompress(CompressionTools.java:108)
> >    at
> org.apache.lucene.index.FieldsReader.uncompress(FieldsReader.java:609)
> >    at
> org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:385)
> >    at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:231)
> >    at
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:1013)
> >    at
> org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:520)
> >    at
> org.apache.lucene.index.FilterIndexReader.document(FilterIndexReader.java:149)
> >    at org.apache.lucene.index.IndexReader.document(IndexReader.java:947)
> >    at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:152)
> >    at org.apache.lucene.search.MultiSearcher.doc(MultiSearcher.java:156)
> >    at org.apache.lucene.search.Hits.doc(Hits.java:180)
> >    at
> de.fhg.scai.bio.tamara.corpusBuilding.LuceneCmdLineInterface.queryMedline(LuceneCmdLineInterface.java:178)
> >    at
> de.fhg.scai.bio.tamara.corpusBuilding.LuceneCmdLineInterface.main(LuceneCmdLineInterface.java:152)
> >
> >
> >this line which causes problems is:
> >String docText = hits.doc(j).getField("DOCUMENT").stringValue() ;
> >
> >I am using java 1.6 and I tried solving this issue with different garbage
> collectors (-XX:+UseParallelGC and -XX:+UseParallelOldGC) but it didn't
> help.
> >
> >Does anyone have any idea how to solve this problem?
> >
> >There is also an official bug report:
> >http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6293787
> >
> >Help is much appreciated. :)
> >
> >Best regards,
> >Tamara Bobic
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> >
>

Re: OutOfMemoryError

Posted by Tamara Bobic <ta...@scai.fraunhofer.de>.

Thank you all (Otis, Mead, Uwe) for your replies! 

It was very helpful and the problem turned out to be very trivial. I was running 32-bit java instead of 64-bit and not enough memory could be reserved. 

Thanks once again, I finally managed to do the whole run successfully :)

All the best,
Tamara

----- Original Message -----
> From: "Otis Gospodnetic" <ot...@yahoo.com>
> To: java-user@lucene.apache.org
> Sent: Tuesday, October 18, 2011 11:14:12 PM
> Subject: Re: OutOfMemoryError
> 
> Bok Tamara,
> 
> You didn't say what -Xmx value you are using.  Try a little higher
> value.  Note that loading field values (and it looks like this one
> may be big because is compressed) from a lot of hits is not
> recommended.
> 
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> >________________________________
> >From: Tamara Bobic <ta...@scai.fraunhofer.de>
> >To: java-user@lucene.apache.org
> >Cc: Roman Klinger <ro...@scai.fraunhofer.de>
> >Sent: Tuesday, October 18, 2011 12:21 PM
> >Subject: OutOfMemoryError
> >
> >Hi all,
> >
> >I am using Lucene to query Medline abstracts and as a result I get
> >around 3 million hits. Each of the hits is processed and
> >information from a certain field is used.
> >
> >After certain number of hits, somewhere around 1 million (not always
> >the same number) I get OutOfMemory exception that looks like this:
> >
> >Exception in thread "main" java.lang.OutOfMemoryError
> >    at java.util.zip.Inflater.inflateBytes(Native Method)
> >    at java.util.zip.Inflater.inflate(Inflater.java:221)
> >    at java.util.zip.Inflater.inflate(Inflater.java:238)
> >    at
> >    org.apache.lucene.document.CompressionTools.decompress(CompressionTools.java:108)
> >    at
> >    org.apache.lucene.index.FieldsReader.uncompress(FieldsReader.java:609)
> >    at
> >    org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:385)
> >    at
> >    org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:231)
> >    at
> >    org.apache.lucene.index.SegmentReader.document(SegmentReader.java:1013)
> >    at
> >    org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:520)
> >    at
> >    org.apache.lucene.index.FilterIndexReader.document(FilterIndexReader.java:149)
> >    at
> >    org.apache.lucene.index.IndexReader.document(IndexReader.java:947)
> >    at
> >    org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:152)
> >    at
> >    org.apache.lucene.search.MultiSearcher.doc(MultiSearcher.java:156)
> >    at org.apache.lucene.search.Hits.doc(Hits.java:180)
> >    at
> >    de.fhg.scai.bio.tamara.corpusBuilding.LuceneCmdLineInterface.queryMedline(LuceneCmdLineInterface.java:178)
> >    at
> >    de.fhg.scai.bio.tamara.corpusBuilding.LuceneCmdLineInterface.main(LuceneCmdLineInterface.java:152)
> >
> >
> >this line which causes problems is:
> >String docText = hits.doc(j).getField("DOCUMENT").stringValue() ;
> >
> >I am using java 1.6 and I tried solving this issue with different
> >garbage collectors (-XX:+UseParallelGC and -XX:+UseParallelOldGC)
> >but it didn't help.
> >
> >Does anyone have any idea how to solve this problem?
> >
> >There is also an official bug report:
> >http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6293787
> >
> >Help is much appreciated. :)
> >
> >Best regards,
> >Tamara Bobic
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: OutOfMemoryError

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Bok Tamara,

You didn't say what -Xmx value you are using.  Try a little higher value.  Note that loading field values (and it looks like this one may be big because is compressed) from a lot of hits is not recommended.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>________________________________
>From: Tamara Bobic <ta...@scai.fraunhofer.de>
>To: java-user@lucene.apache.org
>Cc: Roman Klinger <ro...@scai.fraunhofer.de>
>Sent: Tuesday, October 18, 2011 12:21 PM
>Subject: OutOfMemoryError
>
>Hi all,
>
>I am using Lucene to query Medline abstracts and as a result I get around 3 million hits. Each of the hits is processed and information from a certain field is used.
>
>After certain number of hits, somewhere around 1 million (not always the same number) I get OutOfMemory exception that looks like this:
>
>Exception in thread "main" java.lang.OutOfMemoryError
>    at java.util.zip.Inflater.inflateBytes(Native Method)
>    at java.util.zip.Inflater.inflate(Inflater.java:221)
>    at java.util.zip.Inflater.inflate(Inflater.java:238)
>    at org.apache.lucene.document.CompressionTools.decompress(CompressionTools.java:108)
>    at org.apache.lucene.index.FieldsReader.uncompress(FieldsReader.java:609)
>    at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:385)
>    at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:231)
>    at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:1013)
>    at org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:520)
>    at org.apache.lucene.index.FilterIndexReader.document(FilterIndexReader.java:149)
>    at org.apache.lucene.index.IndexReader.document(IndexReader.java:947)
>    at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:152)
>    at org.apache.lucene.search.MultiSearcher.doc(MultiSearcher.java:156)
>    at org.apache.lucene.search.Hits.doc(Hits.java:180)
>    at de.fhg.scai.bio.tamara.corpusBuilding.LuceneCmdLineInterface.queryMedline(LuceneCmdLineInterface.java:178)
>    at de.fhg.scai.bio.tamara.corpusBuilding.LuceneCmdLineInterface.main(LuceneCmdLineInterface.java:152)
>
>
>this line which causes problems is:
>String docText = hits.doc(j).getField("DOCUMENT").stringValue() ; 
>
>I am using java 1.6 and I tried solving this issue with different garbage collectors (-XX:+UseParallelGC and -XX:+UseParallelOldGC) but it didn't help.
>
>Does anyone have any idea how to solve this problem?
>
>There is also an official bug report:
>http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6293787
>
>Help is much appreciated. :)
>
>Best regards,
>Tamara Bobic
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>

RE: OutOfMemoryError

Posted by Uwe Schindler <uw...@thetaphi.de>.

Hi,

> ...I get around 3
> million hits. Each of the hits is processed and information from a certain field is
> used.

Thats of course fine, but:

> After certain number of hits, somewhere around 1 million (not always the same
> number) I get OutOfMemory exception that looks like this:

You did not tell us *how* you get the hits. If you do something like Searcher.search(query, 1000000) that it can easily memory overflow (sooner or later, maybe on decompressing results maybe somewhere else). Lucene always collects "top-ranking" results and for doing that it uses a priority queue. With the above command (passing 1 million or more as number of top-ranking results, this will use insane amounts of memory). Like most full text search engines, Lucene is optimized for quickly getting the best results. The use-case of fetching *all* possible hits is not really the correct use case of a full text search engine (especially as hits that far at the end are in most cases no more relevant to your query).

To really collect all hits (but in arbitrary order, not sorted by relevance), write your own Collector implementation that collects the results and pass it to searcher. There are several code sample on this mailing list.

Another approach is to use the new "sortAfter" method, available in the next Lucene version (not yet released).

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org