You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pylucene-dev@lucene.apache.org by "Erik Groeneveld (Jira)" <ji...@apache.org> on 2021/07/08 08:36:00 UTC

[jira] [Comment Edited] (PYLUCENE-58) SEGV on import lucene

    [ https://issues.apache.org/jira/browse/PYLUCENE-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376483#comment-17376483 ] 

Erik Groeneveld edited comment on PYLUCENE-58 at 7/8/21, 8:35 AM:
------------------------------------------------------------------

Hi Andy, thanks for your efforts!

I had a lengthy holiday, so I am a bit late with this response.

Too bad you were not able to reproduce it. It is persistent here.

Stepping with gdb told me that the segv was in PyErr_BadInternalCall() that tried to format an error messsage because PyDict_Check(op) said op was invalid. But 'op' in this case was a freshly minted dict for the the new lucene module. How can that be invalid?

My only hypothesis I can think of is that memory/gc is corrupted earlier, because the code that crashes does nothing with (pylucene) input parameters, it just creates a dict. That means that the missing INCREF could very well be the problem.

So I reran the test with trunk.

Building the trunk did not work because:
 # the Makefile assumes ~/apache/lucene.git
 # it generates the 8.9.0 version
 # it does not have the lucene jars

To work around this, I downloaded 8.9.0 sources, and patched the trunk over it with
 svn export --force. Indeed there is INCREF in makeType now.

Unfortunately, the SEGV still happens.

I tracked it down to a point where a Py_DECREF is done on a temporary unicode string. That objects type has invalid values for most of its tp_xxx fields. 0x27, 0x47 etc. Is SEGVs on trying to retrieve the tp_dealloc function. It looks like the type object for unicode strings is gone and ob_type (of the unicode object) is pointing to garbage.

This could be a bug in Python, however, the code that creates a dict is completely generic (no args related to pylucene or whatsoever) and I think that is not the case.

I am suspecting a DECREF too much or some sort of other bug in the initialisation of PyLucene. However, I can not pinpoint it at the moment and I have little clues about how to proceed from here.

 

EDIT: just to be sure that it is not the context that causes the problem, i tried loading my own extension and I created a completely new extension. Both are initialised properly. Note that _lucene.so already fails with PyInit__lucene(...), so I created an extension that does nothing but initialising, like:

 
{code:java}
#define PY_SSIZE_T_CLEAN
#include <Python.h>


static struct PyModuleDef tsetdn_def = {
    .m_base PyModuleDef_HEAD_INIT,
    .m_name = "tsetdn",
    .m_size = 0,
};


PyMODINIT_FUNC PyInit_tsetdn(void) {
    printf("INIT TSETDN");
    PyObject* m = PyModule_Create(&tsetdn_def);
    return m;
}
{code}
Just to be sure again, from the lucene init, I removed all but the call to PyModule_Create, and from the lucene ModuleDef I removed all but the base, name and size.

The init code should be the first to run, yet it fails. It there any static initialisation in lucene that happens before the call to the init function?

END EDIT

 

 

Maybe you have a suggestion?

Best regards,

Erik


was (Author: erik@seecr.nl):
Hi Andy, thanks for your efforts!

I had a lengthy holiday, so I am a bit late with this response.

Too bad you were not able to reproduce it. It is persistent here.

Stepping with gdb told me that the segv was in PyErr_BadInternalCall() that tried to format an error messsage because PyDict_Check(op) said op was invalid. But 'op' in this case was a freshly minted dict for the the new lucene module. How can that be invalid?

My only hypothesis I can think of is that memory/gc is corrupted earlier, because the code that crashes does nothing with (pylucene) input parameters, it just creates a dict. That means that the missing INCREF could very well be the problem.

So I reran the test with trunk.

Building the trunk did not work because:
 # the Makefile assumes ~/apache/lucene.git
 # it generates the 8.9.0 version
 # it does not have the lucene jars

To work around this, I downloaded 8.9.0 sources, and patched the trunk over it with
 svn export --force. Indeed there is INCREF in makeType now.

Unfortunately, the SEGV still happens.

I tracked it down to a point where a Py_DECREF is done on a temporary unicode string. That objects type has invalid values for most of its tp_xxx fields. 0x27, 0x47 etc. Is SEGVs on trying to retrieve the tp_dealloc function. It looks like the type object for unicode strings is gone and ob_type (of the unicode object) is pointing to garbage.

This could be a bug in Python, however, the code that creates a dict is completely generic (no args related to pylucene or whatsoever) and I think that is not the case.

I am suspecting a DECREF too much or some sort of other bug in the initialisation of PyLucene. However, I can not pinpoint it at the moment and I have little clues about how to proceed from here.

Maybe you have a suggestion?

Best regards,

Erik

> SEGV on import lucene
> ---------------------
>
>                 Key: PYLUCENE-58
>                 URL: https://issues.apache.org/jira/browse/PYLUCENE-58
>             Project: PyLucene
>          Issue Type: Bug
>         Environment: Debian Buster, Python 3.7
>            Reporter: Erik Groeneveld
>            Priority: Critical
>
> Hi Andy,
> Thanks again for your great work on PyLucene and JCC!
> Recently, after porting everything to python3, we get occasional SEGV's on shutdown. It happens very late, when the garbage collector starts cleaning up.
> Using python3-dbg exposed another problem however. With python3-dbg, "import lucene" already triggers SEGV. Here is the top of the backtrace:
>  
> {code:bash}
> #0  0x0000000000000060 in ?? ()
> #1  0x00007fe8aee51d6e in unicode_fromformat_write_cstr (writer=writer@entry=0x7ffdc0dcd170, str=<optimized out>, width=width@entry=-1, precision=<optimized out>) at ../Objects/unicodeobject.c:2596
> #2  0x00007fe8aee525ec in unicode_fromformat_arg (vargs=0x7ffdc0dcd150, f=<optimized out>, writer=0x7ffdc0dcd170) at ../Objects/unicodeobject.c:2797
> #3  PyUnicode_FromFormatV (format=<optimized out>, vargs=<optimized out>) at ../Objects/unicodeobject.c:2914
> #4  0x00007fe8aedca3dd in PyErr_FormatV (exception=<type at remote 0x811cc0>, format=0x7fe8aefe2568 "%s:%d: bad argument to internal function", vargs=vargs@entry=0x7ffdc0dcd210) at ../Python/errors.c:835
> #5  0x00007fe8aedca4a4 in PyErr_Format (exception=<optimized out>, format=<optimized out>) at ../Python/errors.c:852
> #6  0x00007fe8aee89fcd in PyDict_SetItem (op=<optimized out>, key=<optimized out>, value=<optimized out>) at ../Objects/dictobject.c:1448
> #7  PyDict_SetItem (op=<optimized out>, key=<optimized out>, value=<optimized out>, op=<optimized out>, key=<optimized out>, value=<optimized out>) at ../Objects/dictobject.c:1443
> #8  0x00007fe8aee76f4a in module_init_dict (md_dict=<unknown at remote 0x7fe8ae9f6060>, name=name@entry=<unknown at remote 0x7fe8ae9f5030>, doc=None, doc@entry=0x0, mod=<optimized out>) at ../Objects/moduleobject.c:72
> #9  0x00007fe8aee7da83 in PyModule_NewObject (name=name@entry=<unknown at remote 0x7fe8ae9f5030>) at ../Objects/moduleobject.c:103
> #10 0x00007fe8aee7de2a in PyModule_New (name=name@entry=0x7fe8b32bfa20 "lucene._lucene") at ../Objects/moduleobject.c:120
> #11 0x00007fe8aee7deec in _PyModule_CreateInitialized (module=0x7fe8b2612080 <_lucene_def>, module_api_version=<optimized out>) at ../Objects/moduleobject.c:215
> #12 0x00007fe8b1238de7 in PyInit__lucene () from /data/bouwen/van_kras/pylucene-8.6.1/build/test/lucene-8.6.1-py3.7-linux-x86_64.egg/lucene/_lucene.cpython-37m-x86_64-linux-gnu.so
> {code}
> It could be that this goes undetected with normal python, yet causes an SEGV on shutdown.
>  
> The error above can be reproduced with the following script that downloads the sources, builds JCC and PyLucene and the executes: python3-dbg -c "import lucene"
>  
> {code:bash}
> # Environment
> # debian buster
> # ant 1.10.5-2
> export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
> export PYTHON=/usr/bin/python3
> export PYLUCENE="pylucene-8.6.1"
> rm ${PYLUCENE}-src.tar.gz ${PYLUCENE} -rf
> wget https://ftp.nluug.nl/internet/apache/lucene/pylucene/${PYLUCENE}-src.tar.gz
> tar xzf ${PYLUCENE}-src.tar.gz
> (cd ${PYLUCENE}
>     (cd jcc
>         export JCC_JDK=${JAVA_HOME}
>         export JCC_INCLUDES=/usr/include/python3.7m:${JAVA_HOME}/include:${JAVA_HOME}/include/linux
>         ${PYTHON} setup.py build
>     )
>     export NUM_FILES=10
>     export ANT=/usr/bin/ant
>     export JCC="${PYTHON} -m jcc --shared"
>     make
>     make test
> )
> PYTHONPATH='pylucene-8.6.1/build/test/lucene-8.6.1-py3.7-linux-x86_64.egg'
> ${PYTHON}-dbg -c "import lucene"
> {code}
> Would you be as kind as to look into this? Perhaps our problem is solved, or it enables us to find an other problem at shutdown.
> Best regards,
> Erik



--
This message was sent by Atlassian Jira
(v8.3.4#803005)