You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pylucene-dev@lucene.apache.org by "Erik Groeneveld (Jira)" <ji...@apache.org> on 2021/07/07 10:54:00 UTC

[jira] [Commented] (PYLUCENE-58) SEGV on import lucene

    [ https://issues.apache.org/jira/browse/PYLUCENE-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376483#comment-17376483 ] 

Erik Groeneveld commented on PYLUCENE-58:
-----------------------------------------

Hi Andy, thanks for your efforts!

I had a lengthy holiday, so I am a bit late with this response.

Too bad you were not able to reproduce it. It is persistent here.

Stepping with gdb told me that the segv was in PyErr_BadInternalCall() that tried to format an error messsage because PyDict_Check(op) said op was invalid. But 'op' in this case was a freshly minted dict for the the new lucene module. How can that be invalid?

My only hypothesis I can think of is that memory/gc is corrupted earlier, because the code that crashes does nothing with (pylucene) input parameters, it just creates a dict. That means that the missing INCREF could very well be the problem.

So I reran the test with trunk.

Building the trunk did not work because:
 # the Makefile assumes ~/apache/lucene.git
 # it generates the 8.9.0 version
 # it does not have the lucene jars

To work around this, I downloaded 8.9.0 sources, and patched the trunk over it with
 svn export --force. Indeed there is INCREF in makeType now.

Unfortunately, the SEGV still happens.

I tracked it down to a point where a Py_DECREF is done on a temporary unicode string. That objects type has invalid values for most of its tp_xxx fields. 0x27, 0x47 etc. Is SEGVs on trying to retrieve the tp_dealloc function. It looks like the type object for unicode strings is gone and ob_type (of the unicode object) is pointing to garbage.

This could be a bug in Python, however, the code that creates a dict is completely generic (no args related to pylucene or whatsoever) and I think that is not the case.

I am suspecting a DECREF too much or some sort of other bug in the initialisation of PyLucene. However, I can not pinpoint it at the moment and I have little clues about how to proceed from here.

Maybe you have a suggestion?

Best regards,

Erik

> SEGV on import lucene
> ---------------------
>
>                 Key: PYLUCENE-58
>                 URL: https://issues.apache.org/jira/browse/PYLUCENE-58
>             Project: PyLucene
>          Issue Type: Bug
>         Environment: Debian Buster, Python 3.7
>            Reporter: Erik Groeneveld
>            Priority: Critical
>
> Hi Andy,
> Thanks again for your great work on PyLucene and JCC!
> Recently, after porting everything to python3, we get occasional SEGV's on shutdown. It happens very late, when the garbage collector starts cleaning up.
> Using python3-dbg exposed another problem however. With python3-dbg, "import lucene" already triggers SEGV. Here is the top of the backtrace:
>  
> {code:bash}
> #0  0x0000000000000060 in ?? ()
> #1  0x00007fe8aee51d6e in unicode_fromformat_write_cstr (writer=writer@entry=0x7ffdc0dcd170, str=<optimized out>, width=width@entry=-1, precision=<optimized out>) at ../Objects/unicodeobject.c:2596
> #2  0x00007fe8aee525ec in unicode_fromformat_arg (vargs=0x7ffdc0dcd150, f=<optimized out>, writer=0x7ffdc0dcd170) at ../Objects/unicodeobject.c:2797
> #3  PyUnicode_FromFormatV (format=<optimized out>, vargs=<optimized out>) at ../Objects/unicodeobject.c:2914
> #4  0x00007fe8aedca3dd in PyErr_FormatV (exception=<type at remote 0x811cc0>, format=0x7fe8aefe2568 "%s:%d: bad argument to internal function", vargs=vargs@entry=0x7ffdc0dcd210) at ../Python/errors.c:835
> #5  0x00007fe8aedca4a4 in PyErr_Format (exception=<optimized out>, format=<optimized out>) at ../Python/errors.c:852
> #6  0x00007fe8aee89fcd in PyDict_SetItem (op=<optimized out>, key=<optimized out>, value=<optimized out>) at ../Objects/dictobject.c:1448
> #7  PyDict_SetItem (op=<optimized out>, key=<optimized out>, value=<optimized out>, op=<optimized out>, key=<optimized out>, value=<optimized out>) at ../Objects/dictobject.c:1443
> #8  0x00007fe8aee76f4a in module_init_dict (md_dict=<unknown at remote 0x7fe8ae9f6060>, name=name@entry=<unknown at remote 0x7fe8ae9f5030>, doc=None, doc@entry=0x0, mod=<optimized out>) at ../Objects/moduleobject.c:72
> #9  0x00007fe8aee7da83 in PyModule_NewObject (name=name@entry=<unknown at remote 0x7fe8ae9f5030>) at ../Objects/moduleobject.c:103
> #10 0x00007fe8aee7de2a in PyModule_New (name=name@entry=0x7fe8b32bfa20 "lucene._lucene") at ../Objects/moduleobject.c:120
> #11 0x00007fe8aee7deec in _PyModule_CreateInitialized (module=0x7fe8b2612080 <_lucene_def>, module_api_version=<optimized out>) at ../Objects/moduleobject.c:215
> #12 0x00007fe8b1238de7 in PyInit__lucene () from /data/bouwen/van_kras/pylucene-8.6.1/build/test/lucene-8.6.1-py3.7-linux-x86_64.egg/lucene/_lucene.cpython-37m-x86_64-linux-gnu.so
> {code}
> It could be that this goes undetected with normal python, yet causes an SEGV on shutdown.
>  
> The error above can be reproduced with the following script that downloads the sources, builds JCC and PyLucene and the executes: python3-dbg -c "import lucene"
>  
> {code:bash}
> # Environment
> # debian buster
> # ant 1.10.5-2
> export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
> export PYTHON=/usr/bin/python3
> export PYLUCENE="pylucene-8.6.1"
> rm ${PYLUCENE}-src.tar.gz ${PYLUCENE} -rf
> wget https://ftp.nluug.nl/internet/apache/lucene/pylucene/${PYLUCENE}-src.tar.gz
> tar xzf ${PYLUCENE}-src.tar.gz
> (cd ${PYLUCENE}
>     (cd jcc
>         export JCC_JDK=${JAVA_HOME}
>         export JCC_INCLUDES=/usr/include/python3.7m:${JAVA_HOME}/include:${JAVA_HOME}/include/linux
>         ${PYTHON} setup.py build
>     )
>     export NUM_FILES=10
>     export ANT=/usr/bin/ant
>     export JCC="${PYTHON} -m jcc --shared"
>     make
>     make test
> )
> PYTHONPATH='pylucene-8.6.1/build/test/lucene-8.6.1-py3.7-linux-x86_64.egg'
> ${PYTHON}-dbg -c "import lucene"
> {code}
> Would you be as kind as to look into this? Perhaps our problem is solved, or it enables us to find an other problem at shutdown.
> Best regards,
> Erik



--
This message was sent by Atlassian Jira
(v8.3.4#803005)