You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pylucene-dev@lucene.apache.org by Moshe Cohen <mo...@gmail.com> on 2009/05/20 02:20:39 UTC

Java Memory errors and "too many open files" when using Pylucene

Hi,
I would like to know if there is any Pylucene-specific issues with regard to
the two JVM errors in the subject.

I have a program maintaining a small  (18 M) index but that does a lot of
indexing and deleting and some optimizing.
In various tests with various parameters, per reading the Lucene doc
regarding these errors, I always reach one of these errors.
The index never has more than a few files in it.
It seems as if there is some resource leak accumulating.

Thank you for any info.

Moshe

Re: Java Memory errors and "too many open files" when using Pylucene

Posted by Andi Vajda <va...@apache.org>.
On Wed, 20 May 2009, Moshe Cohen wrote:

> Thanks.
> Version being used : 2.4.1 .
> I have already tried most of the well documented Lucene ideas. The seemingly
> weird thing  is that the index is always quite small. I have experience of
> much larger indices on SOLR with no such errors.
>
> Started with a memory error, after increasing JVM heap on init I got the too
> many open files error, increased the OS limit and got a memory error
> again:-)
> Of course, I got further along in each stage but ultimately I hit an error.
> I can workaround the problem by just restarting the program. This is what
> lead me to suspecting resource leaks specific to Pylucene.

If it's a small enough program, it might be interesting to see if you can 
reproduce the problem in pure Java.

> Are there any useful monitoring functions that can retrieve the resource
> usage state along the way?

PyLucene is only a wrapper around Java Lucene and the JVM. The one thing you 
can track in that context is how many java objects escaped the jvm to python 
and how many references python holds to them. Use env._dumpRefs(); env is 
what initVM() returns. _dumpRefs() dumps the hashtable of java objects that 
escaped the VM to python listing their java.lang.System::identifyHashCode() 
and how many references python holds to each of them.

If that dump grows beyond reasonable, you have a clue about what could be 
going wrong if you can then track down what the actual objects in question 
are (log their identifyHashCode() when you use them, for example). If it 
doesn't grow, then the problem is most likely on the java side and rewriting 
your program in pure java is going to help with debugging this.

Andi..

Re: Java Memory errors and "too many open files" when using Pylucene

Posted by Moshe Cohen <mo...@gmail.com>.
Thanks.
Version being used : 2.4.1 .
I have already tried most of the well documented Lucene ideas. The seemingly
weird thing  is that the index is always quite small. I have experience of
much larger indices on SOLR with no such errors.

Started with a memory error, after increasing JVM heap on init I got the too
many open files error, increased the OS limit and got a memory error
again:-)
Of course, I got further along in each stage but ultimately I hit an error.
I can workaround the problem by just restarting the program. This is what
lead me to suspecting resource leaks specific to Pylucene.

Are there any useful monitoring functions that can retrieve the resource
usage state along the way?

Moshe



On Wed, May 20, 2009 at 7:31 PM, Andi Vajda <va...@apache.org> wrote:

>
> On Wed, 20 May 2009, Moshe Cohen wrote:
>
>  I would like to know if there is any Pylucene-specific issues with regard
>> to
>> the two JVM errors in the subject.
>>
>
> About memory errors: be sure to give the Java VM enough memory when
> initializing it with initVM(). More about this at [1].
>
> About open files: is your index using the Lucene compound file format ? If
> not, that could help. If you are already, what is your OS and related number
> of file limit ? Have you tried increasing it ? Are you closing all 'things'
> that you think ought to be closed ? What version of PyLucene and Java Lucene
> are you using ?
>
> Unless there is an egregious leak somewhere, both issues should reproduce
> with the same Java program (PyLucene just wraps a Java VM, the same Java
> Lucene code is run). You may want to ask the same questions on
> java-user@lucene.apache.org [2].
>
> Andi..
>
> [1] http://lucene.apache.org/pylucene/jcc/documentation/readme.html#api
> [2] http://lucene.apache.org/java/docs/mailinglists.html
>

Re: Java Memory errors and "too many open files" when using Pylucene

Posted by Andi Vajda <va...@apache.org>.
On Wed, 20 May 2009, Moshe Cohen wrote:

> I would like to know if there is any Pylucene-specific issues with regard to
> the two JVM errors in the subject.

About memory errors: be sure to give the Java VM enough memory when 
initializing it with initVM(). More about this at [1].

About open files: is your index using the Lucene compound file format ? If 
not, that could help. If you are already, what is your OS and related number 
of file limit ? Have you tried increasing it ? Are you closing all 'things' 
that you think ought to be closed ? What version of PyLucene and Java Lucene 
are you using ?

Unless there is an egregious leak somewhere, both issues should reproduce 
with the same Java program (PyLucene just wraps a Java VM, the same Java 
Lucene code is run). You may want to ask the same questions on 
java-user@lucene.apache.org [2].

Andi..

[1] http://lucene.apache.org/pylucene/jcc/documentation/readme.html#api
[2] http://lucene.apache.org/java/docs/mailinglists.html