You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Marc Sturlese <ma...@gmail.com> on 2009/06/13 00:09:19 UTC

memory leak with CustomComparatorSource class variables

Hey there,
I have noticed I am experiencing sort of a memory leak with a
CustomComparatorSource (wich implements SortComparatorSource).
I have a HashMap declared as variable of class in CustomComparatorSource:

final HashMap<String,Integer> docs_to_modify

This HashMap contains ids of documents and priorities used for sorting (the
HashMap is assigned in the constructor).
Normally the HashMap will have different values depending on the query
string requested and normally will have a size of 5000 elements.

I have noticed that this HashMap causes a memory leak. GC will always leave
some memory in use because of this structure. The more requests the more
memory that keeps in use (after lots of debugging and tracing I know the
memory leak is that HasMap) until I get a Tomcat heap space
OutOfMemoryException.

Looks like class variables from a CustomComparatorSource are never freed. I
have thought this could happen if IndexSeracher keeps an instance of
CustomComparatorSource and never frees it...

Any clue why this happens?

My Comparator looks like:
class CustomComparatorSource implements SortComparatorSource 
{
  private final HashMap<String,Integer> docs_to_modify;
  
  public CustomComparatorSource( HashMap<String,Integer> docs_map) {
    this.docs_to_modify = docs_map;
  }
  
  public ScoreDocComparator newComparator(final IndexReader index_reader,
String fieldname) throws IOException 
  {

    final FieldCache.StringIndex index =
FieldCache.DEFAULT.getStringIndex(index_reader, fieldname);
  
    return new ScoreDocComparator () 
    {
      public final int compare (final ScoreDoc d0, final ScoreDoc d1) {
        //... algorithm used to compare
        
        return value_A - value_B;
      }
  
      public Comparable sortValue (final ScoreDoc d0) {
        //... algorithm used to sort value     
        return new Integer( value_A );
      }
  
      public int sortType() {
        return SortField.CUSTOM;
      }
    };
  }
}

And it's called from:
new Sort( new SortField[] { new SortField(idField, new
CustomComparatorSource(docs_map), false )}

Thanks in advance!

-- 
View this message in context: http://www.nabble.com/memory-leak-with-CustomComparatorSource-class-variables-tp24006806p24006806.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: memory leak with CustomComparatorSource class variables

Posted by Michael McCandless <lu...@mikemccandless.com>.
It's here:

  http://lucene.apache.org/java/docs/nightly/

But remember this is trunk code, ie not yet released, so stuff is
still changing.

Mike

On Sat, Jun 13, 2009 at 9:30 AM, Marc Sturlese<ma...@gmail.com> wrote:
>
> Thanks Mike, really useful info. I have dowloaded the latest Lucene 2.9-dev
> to test the implementation of a FieldComparatorSource but the API
> documentation doesn't seem to be availabe.
>
> I can access to the class MissingStringLastComparatorSource:
> http://lucene.apache.org/solr/api/org/apache/solr/search/MissingStringLastComparatorSource.html
> From there I try to link to org.apache.lucene.search.FieldComparatorSource
> but get a 404 error.
> Any idea how can I get access to that documentation?
>
> Thanks in advance!
>
>
> Michael McCandless-2 wrote:
>>
>> On Fri, Jun 12, 2009 at 6:09 PM, Marc Sturlese<ma...@gmail.com>
>> wrote:
>>
>>> I have noticed I am experiencing sort of a memory leak with a
>>> CustomComparatorSource (wich implements SortComparatorSource).
>>> I have a HashMap declared as variable of class in CustomComparatorSource:
>>
>> This is unfortunately a known and rather horrific trap, in Lucene.
>>
>> Lucene's field sorting implementation (FieldSortedHitQueue) caches the
>> comparators use during sorting (in its static package private
>> Comparators field).  They are weakly keyed by IndexReader, so that
>> when the IndexReader is closed, the cache entries are cleared.  When
>> sorting by field this is normally OK since we hold a single entry for
>> that field.
>>
>> But when you provide a SortComparatorSource, it's included in the
>> cache key.  So, if you don't implement hashCode/equals (correctly), or
>> using a singleton or restricted set of instances (say), then suddenly
>> every new instance of your SortComparatorSource will enter the cache
>> and not be cleared until you close that reader.  It easily results in
>> a catastrophic, extremely unexpected memory leak.
>>
>> Lucene 2.9 has fixed this, as a side effect of the move to per-segment
>> field sorting.  SortComparatorSource is replaced by
>> FieldComparatorSource, and this caching of comparators is no longer
>> done.  Another reason to get 2.9 out sooner rather than later!
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/memory-leak-with-CustomComparatorSource-class-variables-tp24006806p24012496.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: memory leak with CustomComparatorSource class variables

Posted by Yonik Seeley <yo...@lucidimagination.com>.
When implementing your own, it also helps to look at the existing
implementations in the FieldComparator class:

http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/search/FieldComparator.java?revision=764551

-Yonik
http://www.lucidimagination.com


On Sat, Jun 13, 2009 at 9:30 AM, Marc Sturlese<ma...@gmail.com> wrote:
>
> Thanks Mike, really useful info. I have dowloaded the latest Lucene 2.9-dev
> to test the implementation of a FieldComparatorSource but the API
> documentation doesn't seem to be availabe.
>
> I can access to the class MissingStringLastComparatorSource:
> http://lucene.apache.org/solr/api/org/apache/solr/search/MissingStringLastComparatorSource.html
> From there I try to link to org.apache.lucene.search.FieldComparatorSource
> but get a 404 error.
> Any idea how can I get access to that documentation?
>
> Thanks in advance!
>
>
> Michael McCandless-2 wrote:
>>
>> On Fri, Jun 12, 2009 at 6:09 PM, Marc Sturlese<ma...@gmail.com>
>> wrote:
>>
>>> I have noticed I am experiencing sort of a memory leak with a
>>> CustomComparatorSource (wich implements SortComparatorSource).
>>> I have a HashMap declared as variable of class in CustomComparatorSource:
>>
>> This is unfortunately a known and rather horrific trap, in Lucene.
>>
>> Lucene's field sorting implementation (FieldSortedHitQueue) caches the
>> comparators use during sorting (in its static package private
>> Comparators field).  They are weakly keyed by IndexReader, so that
>> when the IndexReader is closed, the cache entries are cleared.  When
>> sorting by field this is normally OK since we hold a single entry for
>> that field.
>>
>> But when you provide a SortComparatorSource, it's included in the
>> cache key.  So, if you don't implement hashCode/equals (correctly), or
>> using a singleton or restricted set of instances (say), then suddenly
>> every new instance of your SortComparatorSource will enter the cache
>> and not be cleared until you close that reader.  It easily results in
>> a catastrophic, extremely unexpected memory leak.
>>
>> Lucene 2.9 has fixed this, as a side effect of the move to per-segment
>> field sorting.  SortComparatorSource is replaced by
>> FieldComparatorSource, and this caching of comparators is no longer
>> done.  Another reason to get 2.9 out sooner rather than later!
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/memory-leak-with-CustomComparatorSource-class-variables-tp24006806p24012496.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: memory leak with CustomComparatorSource class variables

Posted by Marc Sturlese <ma...@gmail.com>.
Thanks Mike, really useful info. I have dowloaded the latest Lucene 2.9-dev
to test the implementation of a FieldComparatorSource but the API
documentation doesn't seem to be availabe. 

I can access to the class MissingStringLastComparatorSource:
http://lucene.apache.org/solr/api/org/apache/solr/search/MissingStringLastComparatorSource.html
>From there I try to link to org.apache.lucene.search.FieldComparatorSource
but get a 404 error.
Any idea how can I get access to that documentation?

Thanks in advance!


Michael McCandless-2 wrote:
> 
> On Fri, Jun 12, 2009 at 6:09 PM, Marc Sturlese<ma...@gmail.com>
> wrote:
> 
>> I have noticed I am experiencing sort of a memory leak with a
>> CustomComparatorSource (wich implements SortComparatorSource).
>> I have a HashMap declared as variable of class in CustomComparatorSource:
> 
> This is unfortunately a known and rather horrific trap, in Lucene.
> 
> Lucene's field sorting implementation (FieldSortedHitQueue) caches the
> comparators use during sorting (in its static package private
> Comparators field).  They are weakly keyed by IndexReader, so that
> when the IndexReader is closed, the cache entries are cleared.  When
> sorting by field this is normally OK since we hold a single entry for
> that field.
> 
> But when you provide a SortComparatorSource, it's included in the
> cache key.  So, if you don't implement hashCode/equals (correctly), or
> using a singleton or restricted set of instances (say), then suddenly
> every new instance of your SortComparatorSource will enter the cache
> and not be cleared until you close that reader.  It easily results in
> a catastrophic, extremely unexpected memory leak.
> 
> Lucene 2.9 has fixed this, as a side effect of the move to per-segment
> field sorting.  SortComparatorSource is replaced by
> FieldComparatorSource, and this caching of comparators is no longer
> done.  Another reason to get 2.9 out sooner rather than later!
> 
> Mike
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/memory-leak-with-CustomComparatorSource-class-variables-tp24006806p24012496.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: memory leak with CustomComparatorSource class variables

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Fri, Jun 12, 2009 at 6:09 PM, Marc Sturlese<ma...@gmail.com> wrote:

> I have noticed I am experiencing sort of a memory leak with a
> CustomComparatorSource (wich implements SortComparatorSource).
> I have a HashMap declared as variable of class in CustomComparatorSource:

This is unfortunately a known and rather horrific trap, in Lucene.

Lucene's field sorting implementation (FieldSortedHitQueue) caches the
comparators use during sorting (in its static package private
Comparators field).  They are weakly keyed by IndexReader, so that
when the IndexReader is closed, the cache entries are cleared.  When
sorting by field this is normally OK since we hold a single entry for
that field.

But when you provide a SortComparatorSource, it's included in the
cache key.  So, if you don't implement hashCode/equals (correctly), or
using a singleton or restricted set of instances (say), then suddenly
every new instance of your SortComparatorSource will enter the cache
and not be cleared until you close that reader.  It easily results in
a catastrophic, extremely unexpected memory leak.

Lucene 2.9 has fixed this, as a side effect of the move to per-segment
field sorting.  SortComparatorSource is replaced by
FieldComparatorSource, and this caching of comparators is no longer
done.  Another reason to get 2.9 out sooner rather than later!

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org