You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pylucene-dev@lucene.apache.org by Aaron Lav <as...@pobox.com> on 2009/02/05 20:45:21 UTC

Occasional SIGSEGV passing lists of ints to Java int[]s

It seems like there's a bug which sometimes causes passing lists of
ints to Java int[]s to generate a SIGSEGV.

The crash occurs in Python's listitem.c while iterating over a list
(verified by comparing the RIP reported in the hs_err_pid<pid> file
against a disassembly).  It tries to access a->ob_item[i], which turns
out to be NULL, and thus faults when trying to increment the reference
count with y_INCREF(a->ob_item[i]);

I've verified this by looking through the list with gdb on a core
dump: the list pointer is available in a register, the object header
of the list has the right reference count, length, and type, and the
other elements in the list seem to have the right refcount (1), type 
(PyInt_Type), and plausible integer values.

This occurs for me under ubuntu/x86_64 with the Hotspot JVM
(reproduced on both gutsy/sun-java6-jdk/python2.4 and
intrepid/openjdk-6-jdk/python2.5.2), probably more often on
multicore/multiprocessor machines.  Building JCC with or without
NO_SHARED doesn't seem to make any difference.  It seems to happen
both with -Xcheck:jni as a VM option, and without, and with JCC
versions as late as r740966.

I'm attaching a test case (python and java files) to this email: I've
also put a tarball at http://www.pobox.com/~asl2/jcc-jarray.tar.bz2
which has a prebuilt jar, a build.sh script to build both the jar and
the JCC wrapper, a run.sh script to repeatedly run the test code
(it doesn't always fault, but I'd expect that if it doesn't do so
after 50-100 tries, it's not going to in that environment), and a sample
hs_err_pid logfile.

Does this reproduce for anyone, or sound familiar?

   Thanks,
   Aaron Lav (asl2@pobox.com)




Re: Occasional SIGSEGV passing lists of ints to Java int[]s

Posted by Andi Vajda <va...@apache.org>.
On Thu, 5 Feb 2009, Aaron Lav wrote:

> On Thu, Feb 05, 2009 at 02:45:21PM -0500, Aaron Lav wrote:
>> It seems like there's a bug which sometimes causes passing lists of
>> ints to Java int[]s to generate a SIGSEGV.
>> ...
>
> I've noticed that a call to the wrapped functions doesn't seem to be
> necessary to generate the exception: all that's required is that the
> module be imported and the JVM initalized.  The list then seems to
> have an element set to NULL, and any access, whether from Python or
> from jcc trying to convert it to a JArray<int>, will fault.
>
> (I realized this while trying to figure out why my hardware
> watchpoints weren't triggering.)
>
> I'm attaching a revised test_array.py which still generates a SIGSEGV,
> and a gdb session.
>
> (The NULL tends to show up at one or two offsets, although the
> what offsets those are may vary with the code and environment running, which
> is how I knew to look at 0x1a92a4 in the gdb session.)

Strange indeed.

I looked over this a bit but didn't try to reproduce it yet. I didn't see 
anything odd.

If the mere _presence_ of JCC and calling initVM() causes the arrays to get
corrupted, this could be hard to debug.

Have you tried moving things around, like creating the arrays differently ?
For example: a = list(xrange(count)) instead of the list comprehension ?
Did you try a more recent version of Python ? 
Have you tried this on 32-bit ?

When you don't actually use the arrays after creating them, how are you 
getting the crash ? inside the assert loops ?

Andi..


Re: Occasional SIGSEGV passing lists of ints to Java int[]s

Posted by Aaron Lav <as...@pobox.com>.
On Thu, Feb 05, 2009 at 02:45:21PM -0500, Aaron Lav wrote:
> It seems like there's a bug which sometimes causes passing lists of
> ints to Java int[]s to generate a SIGSEGV.
> ...

I've noticed that a call to the wrapped functions doesn't seem to be
necessary to generate the exception: all that's required is that the
module be imported and the JVM initalized.  The list then seems to
have an element set to NULL, and any access, whether from Python or
from jcc trying to convert it to a JArray<int>, will fault.

(I realized this while trying to figure out why my hardware
watchpoints weren't triggering.)

I'm attaching a revised test_array.py which still generates a SIGSEGV,
and a gdb session.

(The NULL tends to show up at one or two offsets, although the
what offsets those are may vary with the code and environment running, which
is how I knew to look at 0x1a92a4 in the gdb session.)

   Aaron Lav (asl2@pobox.com)



Re: Occasional SIGSEGV passing lists of ints to Java int[]

Posted by Bill Janssen <ja...@parc.com>.
I've seen crashes like this, where the Java heap has taken all of the
space.  Python tries to get a bit for its own objects, and fails, and
crashes.

Bill

Re: Occasional SIGSEGV passing lists of ints to Java int[] - not JCC's fault

Posted by Andi Vajda <va...@apache.org>.
On Feb 16, 2009, at 11:13, Aaron Lav <as...@pobox.com> wrote:

> On Fri, Feb 06, 2009 at 07:58:31PM -0500, Aaron Lav wrote:
>> I've also tried modifying the output testjcc.c so it doesn't
>> contain the lines from INSTALL_TYPE(JObject,module) ... to  
>> '__install__(module);', and it still seems to crash.  At this  
>> point, the amount of JCC
>> code running is really minimal ...
>
> To follow up, I've reproduced the crash without any JCC code, using
> the code at http://www.pobox.com/~asl2/python_jni_bug.tar.gz
>
> I thought everyone would like to know that JCC isn't at fault here.
> (The code in the tarfile initializes Python from within Java, so
> there's not even any call to JNI_CreateJavaVM and thus no code  
> similarity
> at all.)

Thank you for the update. The plot thickens. Please, keep us posted as  
this gets elucidated...

Thanks !

Andi..

>
>
>   Aaron Lav (asl2@pobox.com)
>
>
>

Re: Occasional SIGSEGV passing lists of ints to Java int[] - not JCC's fault

Posted by Aaron Lav <as...@pobox.com>.
On Fri, Feb 06, 2009 at 07:58:31PM -0500, Aaron Lav wrote:
> I've also tried modifying the output testjcc.c so it doesn't
> contain the lines from INSTALL_TYPE(JObject,module) ... to '__install__(module);', and it still seems to crash.  At this point, the amount of JCC
> code running is really minimal ...

To follow up, I've reproduced the crash without any JCC code, using
the code at http://www.pobox.com/~asl2/python_jni_bug.tar.gz

I thought everyone would like to know that JCC isn't at fault here.
(The code in the tarfile initializes Python from within Java, so
there's not even any call to JNI_CreateJavaVM and thus no code similarity
at all.)

   Aaron Lav (asl2@pobox.com)





Re: Occasional SIGSEGV passing lists of ints to Java int[]

Posted by Aaron Lav <as...@pobox.com>.
On Fri, Feb 06, 2009 at 01:44:52PM -0800, Andi Vajda wrote:
>
> It might be time to fiddle with vm args then.
> Have you tried -Xrs ? increasing the Java stack (-Xms) ?
> Maybe Java is garbage collecting something it shouldn't ?

It still crashes with 
"vmargs="-Xms2G,-Xrs,-Xss2M"" (-Xms increases heap, -Xss increases
stack).

I've also tried -verbose:gc, and it doesn't seem to report any gc
activity either when running normally or when crashing.  (I've
also tried -Xloggc:logFileName and -XX:+PrintGCDetails.)  Swapping
in a different GC algorithm with -XX:UseSerialGC didn't seem to help.

(This is all with the code patched to return Py_None from initVM, so
it makes sense that we're not doing any allocation past startup, and
thus no GC.  It is possible that I'm wrong, and that when the process
crashes, there's unflushed output indicating a GC.)

I've also been fiddling with the python driver.  A version which asserts
that the just-inserted int is of type 'int' doesn't fault until we
check the entire array at the end: a version which checks the last 50
or soinserts so does fault, again at one of a small set of offsets.

Those offsets (0x14ff00, 0x12a9bc, 0x1a92bc) are near the series
of sizes at which Python reallocates a list (see Objects/listitem.c),
so it looks like perhaps the malloc arena is getting corrupted
in such a way as to not copy the last 8 bytes of data in a realloc'd
block?

I've tried compiling python --with-pydebug, and exporting MALLOC_CHECK_=3,
and either of these seems to perturb things enough that the bug no
longer manifests.

I've also tried modifying the output testjcc.c so it doesn't
contain the lines from INSTALL_TYPE(JObject,module) ... to '__install__(module);', and it still seems to crash.  At this point, the amount of JCC
code running is really minimal ...


     Aaron Lav (asl2@pobox.com)


Re: Occasional SIGSEGV passing lists of ints to Java int[]

Posted by Andi Vajda <va...@apache.org>.
On Fri, 6 Feb 2009, Aaron Lav wrote:

> On Fri, Feb 06, 2009 at 11:50:14AM -0800, Andi Vajda wrote:
>>
>> Ok, so keeping these commented out, how much can you comment out of the
>> actual initVM() defined in jcc.cpp until it no longer crashes ?
>
> Unfortunately, if I add
>    Py_INCREF(Py_None);
>    return Py_None;
> just before
>     if (JNI_CreateJavaVM(&vm, (void **) &vm_env, &vm_args) < 0)
> it doesn't crash: if I put it just after, then it does.  So either
> initializing the JVM perturbs the environment enough to cause
> the bug to manifest, or the code responsible is somewhere in Hotspot.
>
>> --debug on have an effect on the crash ? (I expect that to also turn
>> compiler optimizations off -O0)
>
> No effect I noticed for --debug.  I checked the compiler command output,
> and it did include -O0.

It might be time to fiddle with vm args then.
Have you tried -Xrs ? increasing the Java stack (-Xms) ?
Maybe Java is garbage collecting something it shouldn't ?

Andi..

>
>    Aaron (asl2@pobox.com)
>

Re: Occasional SIGSEGV passing lists of ints to Java int[]

Posted by Aaron Lav <as...@pobox.com>.
On Fri, Feb 06, 2009 at 11:50:14AM -0800, Andi Vajda wrote:
>
> Ok, so keeping these commented out, how much can you comment out of the 
> actual initVM() defined in jcc.cpp until it no longer crashes ?

Unfortunately, if I add 
    Py_INCREF(Py_None);
    return Py_None;
just before
     if (JNI_CreateJavaVM(&vm, (void **) &vm_env, &vm_args) < 0)
it doesn't crash: if I put it just after, then it does.  So either
initializing the JVM perturbs the environment enough to cause
the bug to manifest, or the code responsible is somewhere in Hotspot.

> --debug on have an effect on the crash ? (I expect that to also turn  
> compiler optimizations off -O0)

No effect I noticed for --debug.  I checked the compiler command output,
and it did include -O0.

    Aaron (asl2@pobox.com)

Re: Occasional SIGSEGV passing lists of ints to Java int[]

Posted by Andi Vajda <va...@apache.org>.
On Feb 6, 2009, at 11:19, Aaron Lav <as...@pobox.com> wrote:

> On Fri, Feb 06, 2009 at 11:07:24AM -0800, Andi Vajda wrote:
>>
>> On Fri, 6 Feb 2009, Aaron Lav wrote:
>>
>>>> Does it crash if you don't call initVM() ?
>>>
>>> No, the call to _testjcc.initVM(...) seems to be required to
>>> make it crash.
>>
>> There are two pieces to initVM():
>>  - initVM() proper (defined in jcc.cpp)
>>  - initializing your classes
>>
>> The initVM() that is called from Python is a function called
>> __initialize__() that is generated by JCC. It's defined in a file  
>> called
>> __init__.cpp. It first calls the actual initVM() and then calls the
>> __initialize__() on each top level package JCC generates wrappers  
>> for.
>>
>> For example, PyLucene's __initialize__() looks like:
>>
>> PyObject *__initialize__(PyObject *module, PyObject *args, PyObject  
>> *kwds)
>> {
>>    PyObject *env = initVM(module, args, kwds);
>>
>>    if (env == NULL)
>>        return NULL;
>>
>>    java::__initialize__(module);
>>    org::__initialize__(module);
>>
>>    return env;
>> }
>>
>> Does it still crash if you comment out the calls to
>> __initialize__(module) that the top level __initialize__() makes ?
>
> Yes.  I commented out the lines
>    #for name, entries in packages:
>    #    line(out, indent + 1, '%s::__initialize__(module);', name)
> from python.py, removed the build directory and rebuilt
> (checking that the 'java::__initialize__' and
> 'org::__initialize__' calls were gone from __init__.cpp), and reran,
> and it still faults.

Ok, so keeping these commented out, how much can you comment out of  
the actual initVM() defined in jcc.cpp until it no longer crashes ?

To make rebuilds fast, use shared
mode and just rebuild jcc so that you get a new libjcc.so (be sure to  
call install).

With non shared mode, you can edit the copy of jcc.cpp JCC copied into  
your extension's build tree and re-build with --install. Does turning  
--debug on have an effect on the crash ? (I expect that to also turn  
compiler optimizations off -O0)

Andi..

>
>
>    Aaron (asl2@pobox.com)

Re: Occasional SIGSEGV passing lists of ints to Java int[]

Posted by Aaron Lav <as...@pobox.com>.
On Fri, Feb 06, 2009 at 11:07:24AM -0800, Andi Vajda wrote:
>
> On Fri, 6 Feb 2009, Aaron Lav wrote:
>
>>> Does it crash if you don't call initVM() ?
>>
>> No, the call to _testjcc.initVM(...) seems to be required to
>> make it crash.
>
> There are two pieces to initVM():
>   - initVM() proper (defined in jcc.cpp)
>   - initializing your classes
>
> The initVM() that is called from Python is a function called  
> __initialize__() that is generated by JCC. It's defined in a file called  
> __init__.cpp. It first calls the actual initVM() and then calls the
> __initialize__() on each top level package JCC generates wrappers for.
>
> For example, PyLucene's __initialize__() looks like:
>
> PyObject *__initialize__(PyObject *module, PyObject *args, PyObject *kwds)
> {
>     PyObject *env = initVM(module, args, kwds);
>
>     if (env == NULL)
>         return NULL;
>
>     java::__initialize__(module);
>     org::__initialize__(module);
>
>     return env;
> }
>
> Does it still crash if you comment out the calls to  
> __initialize__(module) that the top level __initialize__() makes ?

Yes.  I commented out the lines
    #for name, entries in packages:
    #    line(out, indent + 1, '%s::__initialize__(module);', name)
from python.py, removed the build directory and rebuilt 
(checking that the 'java::__initialize__' and 
'org::__initialize__' calls were gone from __init__.cpp), and reran,
and it still faults.

    Aaron (asl2@pobox.com)


Re: Occasional SIGSEGV passing lists of ints to Java int[]

Posted by Andi Vajda <va...@apache.org>.
On Fri, 6 Feb 2009, Aaron Lav wrote:

>> Does it crash if you don't call initVM() ?
>
> No, the call to _testjcc.initVM(...) seems to be required to
> make it crash.

There are two pieces to initVM():
   - initVM() proper (defined in jcc.cpp)
   - initializing your classes

The initVM() that is called from Python is a function called 
__initialize__() that is generated by JCC. It's defined in a file called 
__init__.cpp. It first calls the actual initVM() and then calls the
__initialize__() on each top level package JCC generates wrappers for.

For example, PyLucene's __initialize__() looks like:

PyObject *__initialize__(PyObject *module, PyObject *args, PyObject *kwds)
{
     PyObject *env = initVM(module, args, kwds);

     if (env == NULL)
         return NULL;

     java::__initialize__(module);
     org::__initialize__(module);

     return env;
}

Does it still crash if you comment out the calls to 
__initialize__(module) that the top level __initialize__() makes ?

Andi..


Re: Occasional SIGSEGV passing lists of ints to Java int[]

Posted by Aaron Lav <as...@pobox.com>.
On Fri, Feb 06, 2009 at 10:44:56AM -0800, Andi Vajda wrote:
>
> On Feb 6, 2009, at 10:07, Aaron Lav <as...@pobox.com> wrote:
>
>> On Thu, Feb 05, 2009 at 02:45:21PM -0500, Aaron Lav wrote:
>>>
>>>
>> (apologies for the broken threading.  I don't seem to be
>> getting email from this list: I've tried resubscribing.)
>>
>>
>>> Have you tried moving things around, like creating the arrays  
>>> differently ?
>> For example: a = list(xrange(count)) instead of the list comprehension 
>> ?
>>
>> I've tried list(xrange(count)) and range(count), and those don't fail.
>>
>>    a = []
>>    for i in range(count):
>>        a.append(i)
>>    b = []
>>    for i in range(count):
>>        b.append(i)
>>
>> does fail.
>>
>>> Did you try a more recent version of Python ?
>>
>> I've just tried locally building 2.6.1 on my gutsy laptop, and
>> it still fails with that.
>>
>>> Have you tried this on 32-bit ?
>>
>> I've built an i386-arch KVM VM, and it doesn't crash there.  (It
>> does crash on a very similar x86-64 VM.)
>>
>>> When you don't actually use the arrays after creating them, how are  
>>> you
>> getting the crash ? inside the assert loops ?
>>
>> Yes.  (Some print statements would have made that clearer.)
>>
>> Other ideas I'm planning:
>> * try a python --with-pydebug build
>> * stub out parts of jcc initialization.  (Obviously this could just  
>> move the bug around.)
>> * fiddle with the JVM's GC parameters
>> * see if setting a hardware watchpoint earlier catches where the data 
>> is being changed, or if it seems to be being put in the list wrong.
>
> Does it crash if you don't call initVM() ?

No, the call to _testjcc.initVM(...) seems to be required to
make it crash.

    Aaron Lav (asl2@pobox.com)

Re: Occasional SIGSEGV passing lists of ints to Java int[]s

Posted by Andi Vajda <va...@apache.org>.
On Feb 6, 2009, at 10:07, Aaron Lav <as...@pobox.com> wrote:

> On Thu, Feb 05, 2009 at 02:45:21PM -0500, Aaron Lav wrote:
>>
>>
> (apologies for the broken threading.  I don't seem to be
> getting email from this list: I've tried resubscribing.)
>
>
>> Have you tried moving things around, like creating the arrays  
>> differently ?
> For example: a = list(xrange(count)) instead of the list  
> comprehension ?
>
> I've tried list(xrange(count)) and range(count), and those don't fail.
>
>    a = []
>    for i in range(count):
>        a.append(i)
>    b = []
>    for i in range(count):
>        b.append(i)
>
> does fail.
>
>> Did you try a more recent version of Python ?
>
> I've just tried locally building 2.6.1 on my gutsy laptop, and
> it still fails with that.
>
>> Have you tried this on 32-bit ?
>
> I've built an i386-arch KVM VM, and it doesn't crash there.  (It
> does crash on a very similar x86-64 VM.)
>
>> When you don't actually use the arrays after creating them, how are  
>> you
> getting the crash ? inside the assert loops ?
>
> Yes.  (Some print statements would have made that clearer.)
>
> Other ideas I'm planning:
> * try a python --with-pydebug build
> * stub out parts of jcc initialization.  (Obviously this could just  
> move the bug around.)
> * fiddle with the JVM's GC parameters
> * see if setting a hardware watchpoint earlier catches where the  
> data is being changed, or if it seems to be being put in the list  
> wrong.

Does it crash if you don't call initVM() ?

Andi..

>
>
>   Aaron Lav (asl2@pobox.com)
>

Re: Occasional SIGSEGV passing lists of ints to Java int[]s

Posted by Aaron Lav <as...@pobox.com>.
On Thu, Feb 05, 2009 at 02:45:21PM -0500, Aaron Lav wrote:
> 
> 
(apologies for the broken threading.  I don't seem to be
getting email from this list: I've tried resubscribing.)


> Have you tried moving things around, like creating the arrays differently ?
For example: a = list(xrange(count)) instead of the list comprehension ?

I've tried list(xrange(count)) and range(count), and those don't fail.

    a = []
    for i in range(count):
        a.append(i)
    b = []
    for i in range(count):
        b.append(i)

does fail.

> Did you try a more recent version of Python ? 

I've just tried locally building 2.6.1 on my gutsy laptop, and
it still fails with that.

> Have you tried this on 32-bit ?

I've built an i386-arch KVM VM, and it doesn't crash there.  (It
does crash on a very similar x86-64 VM.)

> When you don't actually use the arrays after creating them, how are you 
getting the crash ? inside the assert loops ?

Yes.  (Some print statements would have made that clearer.)

Other ideas I'm planning:
 * try a python --with-pydebug build
 * stub out parts of jcc initialization.  (Obviously this could just move the bug around.)
 * fiddle with the JVM's GC parameters
 * see if setting a hardware watchpoint earlier catches where the data is being changed, or if it seems to be being put in the list wrong.

   Aaron Lav (asl2@pobox.com)