You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/07/18 07:50:10 UTC

[GitHub] [incubator-mxnet] davidhewitt opened a new issue #18754: Python 3.8 wheel appears to be slightly corrupted

davidhewitt opened a new issue #18754:
URL: https://github.com/apache/incubator-mxnet/issues/18754


   ## Description
   I'm attempting to debug a crash observed by a user of PyO3 (https://github.com/PyO3/pyo3/issues/1044) which occurs when `mxnet` is imported.
   
   Attempting to use `gdb` (via `rust-gdb` wrapper script) suggests that `mxnet.so` is partially corrupted. `readelf -a` also emits some warnings. Both are pasted below.
   
   ### Error Message
   
   Errors seen from `readelf -a path/to/libmxnet.so | grep -i warning`:
   
   ```
   readelf: Warning: Section 0 has an out of range sh_link value of 1179403647
   readelf: Warning: Section 1 has an out of range sh_link value of 2381354254
   readelf: Warning: Section 2 has an out of range sh_link value of 3825592164
   readelf: Warning: Section 3 has an out of range sh_link value of 149717832
   readelf: Warning: Section 3 has an out of range sh_info value of 3171257160
   readelf: Warning: Section 4 has an out of range sh_link value of 2781517102
   readelf: Warning: Section 4 has an out of range sh_info value of 3221398224
   readelf: Warning: Section 5 has an out of range sh_link value of 222961676
   readelf: Warning: Section 5 has an out of range sh_info value of 2840782096
   readelf: Warning: Section 6 has an out of range sh_link value of 536469743
   readelf: Warning: Section 7 has an out of range sh_link value of 548555395
   readelf: Warning: Section 7 has an out of range sh_info value of 564797439
   readelf: Warning: [ 0]: Unexpected value (65794) in info field.
   readelf: Warning: Size of section 1 is larger than the entire file!
   readelf: Warning: [ 3]: Expected link to another section in info fieldreadelf: Warning: [ 4]: Expected link to another section in info fieldreadelf: Warning: [
    5]: Expected link to another section in info fieldreadelf: Warning: Size of section 6 is larger than the entire file!
   readelf: Warning: [ 7]: Expected link to another section in info fieldreadelf: Error: no .dynamic section in the dynamic segment
   ```
   
   Errors seen from `gdb`:
   
   ```
   BFD: warning: /home/david/dev/pyo3-scratch/.direnv/python-3.8.2/lib/python3.8/site-packages/mxnet/libmxnet.so has a corrupt section with a size (ff20fb37000000
   08) larger than the file size
   BFD: warning: /home/david/dev/pyo3-scratch/.direnv/python-3.8.2/lib/python3.8/site-packages/mxnet/libmxnet.so has a corrupt section with a size (ff20fb37000000
   08) larger than the file size
   Error while mapping shared library sections:
   `/home/david/dev/pyo3-scratch/.direnv/python-3.8.2/lib/python3.8/site-packages/mxnet/libmxnet.so': not in executable format: file format not recognized
   ```
   
   ## To Reproduce
   Run `readelf -a path/to/libmxnet.so | grep -i warning`.
   
   Alternatively request and I can write tutorial how to install & run the linked Rust code under rust-lldb.
   
   ## Environment
   
   I'm using Ubuntu 20.04 on WSL2. According to pip, `mxnet` was installed via the wheel `mxnet-1.6.0-py2.py3-none-any.whl`.
   
   ```
   curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python
   
   ----------Python Info----------
   Version      : 3.8.2
   Compiler     : GCC 9.3.0
   Build        : ('default', 'Apr 27 2020 15:53:34')
   Arch         : ('64bit', 'ELF')
   ------------Pip Info-----------
   Version      : 20.0.2
   Directory    : /home/david/dev/pyo3-scratch/.direnv/python-3.8.2/lib/python3.8/site-packages/pip
   ----------MXNet Info-----------
   Version      : 1.6.0
   Directory    : /home/david/dev/pyo3-scratch/.direnv/python-3.8.2/lib/python3.8/site-packages/mxnet
   Num GPUs     : 0
   Commit Hash   : 6eec9da55c5096079355d1f1a5fa58dcf35d6752
   ----------System Info----------
   Platform     : Linux-4.19.104-microsoft-standard-x86_64-with-glibc2.29
   system       : Linux
   node         : david-laptop
   release      : 4.19.104-microsoft-standard
   version      : #1 SMP Wed Feb 19 06:37:35 UTC 2020
   ----------Hardware Info----------
   machine      : x86_64
   processor    : x86_64
   Architecture:                    x86_64
   CPU op-mode(s):                  32-bit, 64-bit
   Byte Order:                      Little Endian
   Address sizes:                   39 bits physical, 48 bits virtual
   CPU(s):                          12
   On-line CPU(s) list:             0-11
   Thread(s) per core:              2
   Core(s) per socket:              6
   Socket(s):                       1
   Vendor ID:                       GenuineIntel
   CPU family:                      6
   Model:                           165
   Model name:                      Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
   Stepping:                        2
   CPU MHz:                         2592.007
   BogoMIPS:                        5184.01
   Hypervisor vendor:               Microsoft
   Virtualization type:             full
   L1d cache:                       192 KiB
   L1i cache:                       192 KiB
   L2 cache:                        1.5 MiB
   L3 cache:                        12 MiB
   Vulnerability Itlb multihit:     KVM: Vulnerable
   Vulnerability L1tf:              Not affected
   Vulnerability Mds:               Not affected
   Vulnerability Meltdown:          Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
   Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
   Vulnerability Spectre v2:        Mitigation; Enhanced IBRS, IBPB conditional, RSB filling
   Vulnerability Tsx async abort:   Not affected
   Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdt
                                    scp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f
                                    16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2
                                     erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities
   ----------Network Test----------
   Setting timeout: 10
   Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0236 sec, LOAD: 0.7427 sec.
   Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0020 sec, LOAD: 0.6102 sec.
   Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.0815 sec, LOAD: 0.5328 sec.
   Timing for D2L: http://d2l.ai, DNS: 0.0247 sec, LOAD: 0.0889 sec.
   Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.0254 sec, LOAD: 0.2696 sec.
   Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0943 sec, LOAD: 0.7244 sec.
   Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0236 sec, LOAD: 0.5804 sec.
   Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: Forbidden, DNS finished in 0.02408432960510254 sec.
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on issue #18754: Python 3.8 wheel appears to be slightly corrupted

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #18754:
URL: https://github.com/apache/incubator-mxnet/issues/18754#issuecomment-662045171


   cc @eric-haibin-lin as the issue may be related engine shutdown 
   
   Debugging leads us to this line: https://github.com/apache/incubator-mxnet/blob/a0e67353fe81ed97fc7aef2d8429a93dc035a394/src/c_api/c_api.cc#L1318.
   
   > Running import mxnet directly in python (and exiting the interpreter) exits normally.
   > 
   > Strangely, on Python 3.8 we get a corrupted double-linked list instead.
   > 
   > I realize that mxnet is absolutely huge, but thought you guys might be interested in knowing about a specific edge-case which causes problems for pyo3. Of course, if you guys are able to provide a bit of insight that'd be super helpful. :) Thank you!
   > 
   
   Via https://github.com/PyO3/pyo3/issues/1044


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on issue #18754: Python 3.8 wheel appears to be slightly corrupted

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #18754:
URL: https://github.com/apache/incubator-mxnet/issues/18754#issuecomment-662042770


   Thank you for trying that @davidhewitt. The difference between the libmxnet.so in the pip wheel and the one obtained via the build_from_source guide is that in the pip wheel a number of dependencies are statically linked to enable portability across various linux distributions and versions thereof. It's built on a CentOS 7 host for compliance with https://www.python.org/dev/peps/pep-0599/
   
   > Downgrading to a slightly older version of mxnet seems to work:
   > 
   > $ pip install mxnet==1.6.0b20200127
   > 
   > The libmxnet.so file in 1.6.0 is very broken. Not sure why the python3.8 interpreter doesn't crash. It also calls Py_FinalizeEx.
   
   https://github.com/PyO3/pyo3/issues/1044#issuecomment-660469745
   
   @m-ou-se can you provide more details on why the file is broken? Does your statement apply to the latest versions at https://dist.mxnet.io/python/cpu , for example https://repo.mxnet.io/dist/python/cpu/mxnet-2.0.0b20200720-py2.py3-none-manylinux2014_x86_64.whl


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on issue #18754: Python 3.8 wheel appears to be slightly corrupted

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #18754:
URL: https://github.com/apache/incubator-mxnet/issues/18754#issuecomment-661251800


   @davidhewitt can you follow https://mxnet.apache.org/get_started/build_from_source and verify if the same warnings apply to the `libmxnet.so` generated when following the guide?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu closed issue #18754: Python 3.8 wheel appears to be slightly corrupted

Posted by GitBox <gi...@apache.org>.
leezu closed issue #18754:
URL: https://github.com/apache/incubator-mxnet/issues/18754


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu edited a comment on issue #18754: Python 3.8 wheel appears to be slightly corrupted

Posted by GitBox <gi...@apache.org>.
leezu edited a comment on issue #18754:
URL: https://github.com/apache/incubator-mxnet/issues/18754#issuecomment-662042770


   Thank you for trying that @davidhewitt. The difference between the libmxnet.so in the pip wheel and the one obtained via the build_from_source guide is that in the pip wheel a number of dependencies are statically linked to enable portability across various linux distributions and versions thereof. It's built on a CentOS 7 host for compliance with https://www.python.org/dev/peps/pep-0599/
   
   > Downgrading to a slightly older version of mxnet seems to work:
   > 
   > $ pip install mxnet==1.6.0b20200127
   > 
   > The libmxnet.so file in 1.6.0 is very broken. Not sure why the python3.8 interpreter doesn't crash. It also calls Py_FinalizeEx.
   
   https://github.com/PyO3/pyo3/issues/1044#issuecomment-660469745
   
   @m-ou-se can you provide more details on why the file is broken? Specifically, I'm not sure if the issue at hand is really a build issue or rather a bug in the MXNet backend. Does your statement apply to the latest versions at https://dist.mxnet.io/python/cpu , for example https://repo.mxnet.io/dist/python/cpu/mxnet-2.0.0b20200720-py2.py3-none-manylinux2014_x86_64.whl


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on issue #18754: Python 3.8 wheel appears to be slightly corrupted

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #18754:
URL: https://github.com/apache/incubator-mxnet/issues/18754#issuecomment-663159928


   The crash is fixed by https://github.com/apache/incubator-mxnet/pull/18768
   
   I'm not sure about the `readelf` warnings, but currently I don't see any evidence for them being harmful (and we're just using cmake to statically link in a bunch of libraries into the `libmxnet.so`). So I'll close this issue. Please comment and reopen if any action needs to be taken from the build perspective.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] davidhewitt commented on issue #18754: Python 3.8 wheel appears to be slightly corrupted

Posted by GitBox <gi...@apache.org>.
davidhewitt commented on issue #18754:
URL: https://github.com/apache/incubator-mxnet/issues/18754#issuecomment-661716868


   When following the guide, I was able to build a `libmxnet.so` which did not have these warnings. Using Ubuntu 20.04, WSL2. I had to disable CUDA support.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org