You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/07/20 09:22:34 UTC

[GitHub] [incubator-mxnet] nabulsi opened a new issue #18759: Segmentation Fault 11

nabulsi opened a new issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759


   ## Description
   >> import mxnet
   
   Segmentation fault: 11
   
   Aborted (core dumped)
   ### Error Message
   (Paste the complete error message. Please also include stack trace by setting environment variable `DMLC_LOG_STACK_TRACE_DEPTH=10` before running your script.)
   
   ## To Reproduce
   (If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)
   
   ### Steps to reproduce
   (Paste the commands you ran that produced the error.)
   
   1.
   2.
   
   ## What have you tried to solve it?
   
   1.
   2.
   
   ## Environment
   - I have followed the steps to build mxnet from source as described here ( https://mxnet.apache.org/get_started/jetson_setup ). I used Docker running on an AWS EC2 instance and Deep Learning AMI (i.e. Docker, MXNET, Cuda, etc.. are all built in). I then downloaded the generated libmxnet.so file to the Jetson Nano.
   
   Next, on the Jetson Nano:
   - git clone --recursive https://github.com/apache/incubator-mxnet.git mxnet
   
   - added the following to ~/.bashrc:
   export PATH=/usr/local/cuda/bin:$PATH
   export MXNET_HOME=$HOME/mxnet/
   export PYTHONPATH=$MXNET_HOME/python:$PYTHONPATH
   
   source ~/.bashrc
   - cd $MXNET_HOME/python
   sudo pip3 install -e .
   
   Then everytime I import mxnet I get the segmentation error
   ```
   curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python
   
   # paste outputs here
   ```
   ----------Python Info----------
   Version      : 3.7.7
   Compiler     : GCC 7.5.0
   Build        : ('default', 'Jun 25 2020 13:11:10')
   Arch         : ('64bit', '')
   ------------Pip Info-----------
   Version      : 20.1.1
   Directory    : /home/username/python3.7/lib/python3.7/site-packages/pip
   ----------MXNet Info-----------
   
   Segmentation fault: 11
   
   Aborted (core dumped)
   ```
   
   ### Some details regarding the Jetson Nano:
   ```
   Device 0: "NVIDIA Tegra X1"
     CUDA Driver Version / Runtime Version          10.2 / 10.2
     CUDA Capability Major/Minor version number:    5.3
     Total amount of global memory:                 3956 MBytes (4148449280 bytes)
     ( 1) Multiprocessors, (128) CUDA Cores/MP:     128 CUDA Cores
     GPU Max Clock rate:                            922 MHz (0.92 GHz)
     Memory Clock rate:                             13 Mhz
     Memory Bus Width:                              64-bit
     L2 Cache Size:                                 262144 bytes
     Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
     Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
     Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
     Total amount of constant memory:               65536 bytes
     Total amount of shared memory per block:       49152 bytes
     Total number of registers available per block: 32768
     Warp size:                                     32
     Maximum number of threads per multiprocessor:  2048
     Maximum number of threads per block:           1024
     Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
     Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
     Maximum memory pitch:                          2147483647 bytes
     Texture alignment:                             512 bytes
     Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
     Run time limit on kernels:                     Yes
     Integrated GPU sharing Host Memory:            Yes
     Support host page-locked memory mapping:       Yes
     Alignment requirement for Surfaces:            Yes
     Device has ECC support:                        Disabled
     Device supports Unified Addressing (UVA):      Yes
     Device supports Compute Preemption:            No
     Supports Cooperative Kernel Launch:            No
     Supports MultiDevice Co-op Kernel Launch:      No
     Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
     Compute Mode:
        < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
   
   deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
   Result = PASS
   ```
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] nabulsi commented on issue #18759: Jetson: Segmentation Fault 11 When Importing MXNET

Posted by GitBox <gi...@apache.org>.

nabulsi commented on issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759#issuecomment-661478691


   Yes, these are the steps that I followed. As said, I also tried before to cross compile and then use the generated library on the Jetson, but was getting the same issue. I am currently flashing my SD card with the very original content and will try it on it to see if any changes I made are causing the problem. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] leezu commented on issue #18759: Jetson: Segmentation Fault 11 When Importing MXNET

Posted by GitBox <gi...@apache.org>.

leezu commented on issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759#issuecomment-662076101


   Following the cross-build instructions locally was blocked for a few months due to non-public toolchain files. NVidia has now provided some files, but for example cuDNN is missing. @TristonC is tracking that internally and may update the build at https://github.com/apache/incubator-mxnet/pull/18450.
   
   @mseth10 had you verified the cross-compile libmxnet on device already?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] szha commented on issue #18759: Jetson: Segmentation Fault 11 When Importing MXNET

Posted by GitBox <gi...@apache.org>.

szha commented on issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759#issuecomment-661557133


   @mseth10 @leezu would we be able to offer a wheel through cross compilation?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mseth10 commented on issue #18759: Jetson: Segmentation Fault 11 When Importing MXNET

Posted by GitBox <gi...@apache.org>.

mseth10 commented on issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759#issuecomment-661561160


   @szha yeah it should be possible now that NVIDIA has shared their cross compilation toolchain on apt server.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mseth10 commented on issue #18759: Jetson: Segmentation Fault 11 When Importing MXNET

Posted by GitBox <gi...@apache.org>.

mseth10 commented on issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759#issuecomment-661492359


   Please let me know how it goes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] leezu commented on issue #18759: Segmentation Fault 11 When Importing MXNET

Posted by GitBox <gi...@apache.org>.

leezu commented on issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759#issuecomment-661214880


   cc @mseth10 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mseth10 commented on issue #18759: Jetson: Segmentation Fault 11 When Importing MXNET

Posted by GitBox <gi...@apache.org>.

mseth10 commented on issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759#issuecomment-661471097


   Jetson Nano might need a higher swap memory (>20GB) and a very long time. I built it on Jetson Xavier AGX and it still took a few hours.
   
   Do you mean the following code (from the tutorial) segfaults? Which Jetpack version is installed on your device? It should work for Jetpack 4.4
   ```
   sudo apt-get update
   sudo apt-get install -y git build-essential libopenblas-dev libopencv-dev python3-pip
   sudo pip3 install -U pip
   
   wget https://mxnet-public.s3.us-east-2.amazonaws.com/install/jetson/1.6.0/mxnet_cu102-1.6.0-py2.py3-none-linux_aarch64.whl
   sudo pip3 install mxnet_cu102-1.6.0-py2.py3-none-linux_aarch64.whl
   
   >>> python3 -c 'import mxnet'
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] nabulsi commented on issue #18759: Jetson: Segmentation Fault 11 When Importing MXNET

Posted by GitBox <gi...@apache.org>.

nabulsi commented on issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759#issuecomment-662740615


   @mseth10 the wheel is currently enough for me. I can move forward now, but I am worried if in the next few days/weeks I find that I need something more and I will have to cross compile. It will be great when you have some time to check it. Also, I noticed that a few days ago they [removed support for Make file](https://github.com/apache/incubator-mxnet/pull/18721), and consequently, the building instructions on this [doc page](https://mxnet.apache.org/get_started/jetson_setup) are not valid any more.  
   Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mseth10 commented on issue #18759: Jetson: Segmentation Fault 11 When Importing MXNET

Posted by GitBox <gi...@apache.org>.

mseth10 commented on issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759#issuecomment-662716632


   @nabulsi that's great news. I have not yet tested the cross compilation script provided on the installation page, and it might need some fixing. Until that is done, is there anything that you are blocked on currently, anything that you are unable to do with the wheel provided?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mseth10 commented on issue #18759: Jetson: Segmentation Fault 11 When Importing MXNET

Posted by GitBox <gi...@apache.org>.

mseth10 commented on issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759#issuecomment-661444177


   Hi @nabulsi , thanks for raising this issue. I have tested the instructions ([here](https://mxnet.apache.org/get_started/jetson_setup)) on building MXNet on Jetson module myself and they work fine. There is also a tutorial that you can follow:
   https://mxnet.apache.org/api/python/docs/tutorials/deploy/inference/image_classification_jetson.html
   This tutorial also contains a link to (3rd party) MXNet wheel that you can directly use. This wheel has been built using the steps on the installation page.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] nabulsi commented on issue #18759: Jetson: Segmentation Fault 11 When Importing MXNET

Posted by GitBox <gi...@apache.org>.

nabulsi commented on issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759#issuecomment-661461820


   Hi @mseth10 . Thanks for working on the issue.
   I just tried the method with the wheel that you mentioned. I also got the same error message:
   ```
   root@user-jetson2:/home/user# python
   Python 3.7.7 (default, Jun 25 2020, 13:11:10) 
   [GCC 7.5.0] on linux
   Type "help", "copyright", "credits" or "license" for more information.
   >>> import mxnet
   
   Segmentation fault: 11
   
   Segmentation fault (core dumped)
   ```
   Did you try using python 3.7?
   
   Another thing: for me, building MXNet from source on the Jetson Nano mostly doesn't finish. After 2.5 hours it just interrupts, although I have swap file of 6 GB :(


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] nabulsi commented on issue #18759: Jetson: Segmentation Fault 11 When Importing MXNET

Posted by GitBox <gi...@apache.org>.

nabulsi commented on issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759#issuecomment-662225505


   **This is my update: Problem is not happening any more.**  
   
   I used a fresh image provided by NVidia for my Nano and then went through all the steps again (installed python3.7, installed dependencies, etc..). Then I used the wheel mentioned above to install MXNet 1.6:
   ```
   wget https://mxnet-public.s3.us-east-2.amazonaws.com/install/jetson/1.6.0/mxnet_cu102-1.6.0-py2.py3-none-linux_aarch64.whl
   sudo pip3 install mxnet_cu102-1.6.0-py2.py3-none-linux_aarch64.whl
   ```
   After that I compiled OpenCV 4.4.0 from source. The segmentation errors are not happening any more.
   
   That said, I need your guidance please with one thing please:
   I tried to cross compile MXNET 2.0.0 as described [here](https://mxnet.apache.org/get_started/jetson_setup), i.e. by using Docker on an EC2 instance (P3) with Deep Learning AMI :
   `$MXNET_HOME/ci/build.py -p jetson`
   
   While I was able to import the generated library on the Nano, I received errors when trying to work with it:
   ```
   import mxnet as mx
   a = mx.nd.ones((2, 3), mx.gpu())
   b = a * 2 + 1
   b.asnumpy()
   
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/home/jetson/mxnet/python/mxnet/ndarray/ndarray.py", line 2570, in asnumpy
       ctypes.c_size_t(data.size)))
     File "/home/jetson/mxnet/python/mxnet/base.py", line 246, in check_call
       raise get_last_ffi_error()
   mxnet.base.MXNetError: Traceback (most recent call last):
     File "/work/mxnet/src/operator/numpy/../tensor/../../common/../operator/mxnet_op.h", line 1132
   Name: Check failed: err == cudaSuccess (209 vs. 0) : mxnet_generic_kernel ErrStr:no kernel image is available for execution on the device
   ```
   Upon running `cuobjdump /path/to/libmxnet.so` I noticed that the architecture is showing as `arch = sm_52`, whereas as we know the Nano has `sm_53`
   
   How can I cross compile for the Nano on an EC2 instance?
   
   Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org