You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2006/11/30 08:01:41 UTC

2.1-dev memory leak?

Hi,

Is anyone running Lucene trunk/HEAD version in a serious production system?  Anyone noticed any memory leaks?

I'm asking because I recently bravely went from 1.9.1 to 2.1-dev (trunk from about a week ago) and all of a sudden my application that was previosly consuming about 1.5GB (-Xmx1500m) now consumes 2.2GB, and blows up after it exhausts the whole heap and the GC can't make any more room there after running for about 3-6 hours and handling several tens of thousands of queries.

I'd love to go back to 2.0.0, or even back to 1.9.1 and run that for a while and just double-check that it really is the the Lucene upgrade that is the source of the leak, but unfortunately because of LUCENE-701 (lockless commits), I can't go back that easily without reindexing...

Moreover, I just looked at CHANGES.txt from 1.9.1 to present, and I think the biggest change since then was LUCENE-701.
LUCENE-672 (segment merge policy) was also pretty big, but from what I can tell, the memory leak is somewhere in the search part, not indexing part.  There have been a number of other search-time optimizations since 2.0.0, so it's hard to tell what the cause is.  Of course, it could turn out to be a leak in my own code, but I'm pretty sure my changes were limited to removal of deprecated methods, so I can start using 2.1.

        IndexDescriptor indexDescriptor = getIndexDescriptorFromCache(indexID);

        try {
            // if this is a known index
            if (indexDescriptor != null) {
                cacheHits++;
                // if the index has changed since this Searcher was created, make a new Searcher
                long currentVersion = IndexReader.getCurrentVersion(indexID);
                if (currentVersion > indexDescriptor.lastKnownVersion) {
                    hitButChanged++;
                    // modified index detected
                    indexDescriptor.lastKnownVersion = currentVersion;
                    indexDescriptor.searcher = new LuceneSearcher(new File(indexID));
                }
                else {
                    // index not modified, reusing searcher
                }
            }
            // if this is a new index
            else {
                cacheMisses++;
                File indexDir = validateIndex(indexID);
                indexDescriptor = new IndexDescriptor();
                indexDescriptor.indexDir = indexDir;
                indexDescriptor.lastKnownVersion = IndexReader.getCurrentVersion(indexDir);
                indexDescriptor.searcher = new LuceneSearcher(indexDir);
            }
            return cacheIndexDescriptor(indexDescriptor);
        }
        catch (IOException e) {
            throw new SearcherException("Cannot open index: " + indexID, e);
        }

So this is just caching of "IndexDescriptor" objects, which have "LuceneSearcher" objects in them.
The cache is a small LRU cache with max size of 37.  The app actually consists of a few tens of thousands of Lucene indices, so this small cache results in only 20% cache hit ratio.

And then the LuceneSearcher ctor looks like this:

    LuceneSearcher(File indexDir) throws IOException {
        _indexDir = FSDirectory.getDirectory(indexDir, false);
        _searcher = new IndexSearcher(_indexDir);
    }

This _searcher (IndexSearcher) is then used in various search methods of this class.
There are no close() calls anywhere.  In other words, I don't explicitly close IndexSearchers, I just let them get GC collected. 
This stuff has been working for well for 1-2 years, and I just started exhausting the JVM heap about a week ago when I went from 1.9.1 to 2.1-dev.
Any other overly brave/crazy souls out there who are running the bleeding edge version in production environment?
This is running on FedoraCore3 under JDK 1.5_09 (latest 1.5).

Thanks,
Otis




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: 2.1-dev memory leak?

Posted by Michael McCandless <lu...@mikemccandless.com>.
Otis Gospodnetic wrote:
> Hi,
> 
> Is anyone running Lucene trunk/HEAD version in a serious production system?  Anyone noticed any memory leaks?
> 
> I'm asking because I recently bravely went from 1.9.1 to 2.1-dev (trunk from about a week ago) and all of a sudden my application that was previosly consuming about 1.5GB (-Xmx1500m) now consumes 2.2GB, and blows up after it exhausts the whole heap and the GC can't make any more room there after running for about 3-6 hours and handling several tens of thousands of queries.

Whoa, I'm sorry to hear this Otis :(

> I'd love to go back to 2.0.0, or even back to 1.9.1 and run that for a while and just double-check that it really is the the Lucene upgrade that is the source of the leak, but unfortunately because of LUCENE-701 (lockless commits), I can't go back that easily without reindexing...
> 
> Moreover, I just looked at CHANGES.txt from 1.9.1 to present, and I think the biggest change since then was LUCENE-701.

The file-format changes for lockless commits are small enough that
making a tool to back-convert a lockless format index into a
pre-lockless format index (so that Lucene 2.0 can read/write to it) is
fairly simple.

OK I coded up a first version.  I will open a JIRA issue and attach a
patch.

We clearly need to also get to the bottom of where the memory leak is,
but I think first priority is to stabilize your production
environment.  Hopefully this tool can at least get you back up in
production and then also enable us to narrow down where the memory
leak is.

Please tread carefully though: it makes me very nervous that this tool
I just created would be used in your production environment!
Obviously first test it in a sandbox, running against your production
index(es).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org