You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Gagandeep singh <ga...@gmail.com> on 2013/08/07 04:19:56 UTC

OutOfMemoryError in TermContext.build

Hi folks

I am running a mapreduce which generates More Like This for a list of
documents. For every source document, a set of important words is picked by
generating a TermQuery for a word and seeing the score of the query using
searcher.explain()

My index size on disk is 362MB and I have started solr with 4GB of memory.
However, i'm getting this exception which i'm not able to understand:

Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at
org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.<init>(BlockTreeTermsReader.java:2266)
        at
org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.getFrame(BlockTreeTermsReader.java:1414)
        at
org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.pushFrame(BlockTreeTermsReader.java:1445)
        at
org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.seekExact(BlockTreeTermsReader.java:1658)
        at org.apache.lucene.index.TermContext.build(TermContext.java:95)
        at
org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:166)
        at
org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.<init>(ConstantScoreQuery.java:101)
        at
org.apache.lucene.search.ConstantScoreQuery.createWeight(ConstantScoreQuery.java:270)
        at
com.bloomreach.rp.ScoreSquashingQuery.createWeight(ScoreSquashingQuery.java:46)
        at
org.apache.lucene.search.BooleanQuery$BooleanWeight.<init>(BooleanQuery.java:183)
        at
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:384)
        at
com.bloomreach.rp.ScoreSquashingQuery.createWeight(ScoreSquashingQuery.java:46)
        at
org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:675)
        at
org.apache.lucene.search.IndexSearcher.explain(IndexSearcher.java:643)
        at
org.apache.solr.search.SolrIndexSearcher.explain(SolrIndexSearcher.java:2058)
        at
com.bloomreach.rp.MLTQueryParser.computeScoreForWordOrPhrase(MLTQueryParser.java:321)

I tried digging into the code for TermQuery.createWeight(), the function
says that its caching the term lookup:
     public Weight createWeight(IndexSearcher searcher) throws IOException {
        ...
        // make TermQuery single-pass if we don't have a PRTS or if the
context differs!
        termState = TermContext.build(context, term, true); // cache term
lookups!


But how can this cause an OOM, i thought the index is already a map from
term to doc,field,position that is loaded in memory, why is this loading
anything extra in memory?

Thanks
Gagan