You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/07/20 09:22:34 UTC
[GitHub] [incubator-mxnet] nabulsi opened a new issue #18759: Segmentation Fault 11
nabulsi opened a new issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759
## Description
>> import mxnet
Segmentation fault: 11
Aborted (core dumped)
### Error Message
(Paste the complete error message. Please also include stack trace by setting environment variable `DMLC_LOG_STACK_TRACE_DEPTH=10` before running your script.)
## To Reproduce
(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)
### Steps to reproduce
(Paste the commands you ran that produced the error.)
1.
2.
## What have you tried to solve it?
1.
2.
## Environment
- I have followed the steps to build mxnet from source as described here ( https://mxnet.apache.org/get_started/jetson_setup ). I used Docker running on an AWS EC2 instance and Deep Learning AMI (i.e. Docker, MXNET, Cuda, etc.. are all built in). I then downloaded the generated libmxnet.so file to the Jetson Nano.
Next, on the Jetson Nano:
- git clone --recursive https://github.com/apache/incubator-mxnet.git mxnet
- added the following to ~/.bashrc:
export PATH=/usr/local/cuda/bin:$PATH
export MXNET_HOME=$HOME/mxnet/
export PYTHONPATH=$MXNET_HOME/python:$PYTHONPATH
source ~/.bashrc
- cd $MXNET_HOME/python
sudo pip3 install -e .
Then everytime I import mxnet I get the segmentation error
```
curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python
# paste outputs here
```
----------Python Info----------
Version : 3.7.7
Compiler : GCC 7.5.0
Build : ('default', 'Jun 25 2020 13:11:10')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 20.1.1
Directory : /home/username/python3.7/lib/python3.7/site-packages/pip
----------MXNet Info-----------
Segmentation fault: 11
Aborted (core dumped)
```
### Some details regarding the Jetson Nano:
```
Device 0: "NVIDIA Tegra X1"
CUDA Driver Version / Runtime Version 10.2 / 10.2
CUDA Capability Major/Minor version number: 5.3
Total amount of global memory: 3956 MBytes (4148449280 bytes)
( 1) Multiprocessors, (128) CUDA Cores/MP: 128 CUDA Cores
GPU Max Clock rate: 922 MHz (0.92 GHz)
Memory Clock rate: 13 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 262144 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-mxnet] nabulsi commented on issue #18759: Jetson: Segmentation Fault 11 When Importing MXNET
Posted by GitBox <gi...@apache.org>.
nabulsi commented on issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759#issuecomment-661478691
Yes, these are the steps that I followed. As said, I also tried before to cross compile and then use the generated library on the Jetson, but was getting the same issue. I am currently flashing my SD card with the very original content and will try it on it to see if any changes I made are causing the problem.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-mxnet] leezu commented on issue #18759: Jetson: Segmentation Fault 11 When Importing MXNET
Posted by GitBox <gi...@apache.org>.
leezu commented on issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759#issuecomment-662076101
Following the cross-build instructions locally was blocked for a few months due to non-public toolchain files. NVidia has now provided some files, but for example cuDNN is missing. @TristonC is tracking that internally and may update the build at https://github.com/apache/incubator-mxnet/pull/18450.
@mseth10 had you verified the cross-compile libmxnet on device already?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-mxnet] szha commented on issue #18759: Jetson: Segmentation Fault 11 When Importing MXNET
Posted by GitBox <gi...@apache.org>.
szha commented on issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759#issuecomment-661557133
@mseth10 @leezu would we be able to offer a wheel through cross compilation?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-mxnet] mseth10 commented on issue #18759: Jetson: Segmentation Fault 11 When Importing MXNET
Posted by GitBox <gi...@apache.org>.
mseth10 commented on issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759#issuecomment-661561160
@szha yeah it should be possible now that NVIDIA has shared their cross compilation toolchain on apt server.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-mxnet] mseth10 commented on issue #18759: Jetson: Segmentation Fault 11 When Importing MXNET
Posted by GitBox <gi...@apache.org>.
mseth10 commented on issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759#issuecomment-661492359
Please let me know how it goes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-mxnet] leezu commented on issue #18759: Segmentation Fault 11 When Importing MXNET
Posted by GitBox <gi...@apache.org>.
leezu commented on issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759#issuecomment-661214880
cc @mseth10
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-mxnet] mseth10 commented on issue #18759: Jetson: Segmentation Fault 11 When Importing MXNET
Posted by GitBox <gi...@apache.org>.
mseth10 commented on issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759#issuecomment-661471097
Jetson Nano might need a higher swap memory (>20GB) and a very long time. I built it on Jetson Xavier AGX and it still took a few hours.
Do you mean the following code (from the tutorial) segfaults? Which Jetpack version is installed on your device? It should work for Jetpack 4.4
```
sudo apt-get update
sudo apt-get install -y git build-essential libopenblas-dev libopencv-dev python3-pip
sudo pip3 install -U pip
wget https://mxnet-public.s3.us-east-2.amazonaws.com/install/jetson/1.6.0/mxnet_cu102-1.6.0-py2.py3-none-linux_aarch64.whl
sudo pip3 install mxnet_cu102-1.6.0-py2.py3-none-linux_aarch64.whl
>>> python3 -c 'import mxnet'
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-mxnet] nabulsi commented on issue #18759: Jetson: Segmentation Fault 11 When Importing MXNET
Posted by GitBox <gi...@apache.org>.
nabulsi commented on issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759#issuecomment-662740615
@mseth10 the wheel is currently enough for me. I can move forward now, but I am worried if in the next few days/weeks I find that I need something more and I will have to cross compile. It will be great when you have some time to check it. Also, I noticed that a few days ago they [removed support for Make file](https://github.com/apache/incubator-mxnet/pull/18721), and consequently, the building instructions on this [doc page](https://mxnet.apache.org/get_started/jetson_setup) are not valid any more.
Thanks!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-mxnet] mseth10 commented on issue #18759: Jetson: Segmentation Fault 11 When Importing MXNET
Posted by GitBox <gi...@apache.org>.
mseth10 commented on issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759#issuecomment-662716632
@nabulsi that's great news. I have not yet tested the cross compilation script provided on the installation page, and it might need some fixing. Until that is done, is there anything that you are blocked on currently, anything that you are unable to do with the wheel provided?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-mxnet] mseth10 commented on issue #18759: Jetson: Segmentation Fault 11 When Importing MXNET
Posted by GitBox <gi...@apache.org>.
mseth10 commented on issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759#issuecomment-661444177
Hi @nabulsi , thanks for raising this issue. I have tested the instructions ([here](https://mxnet.apache.org/get_started/jetson_setup)) on building MXNet on Jetson module myself and they work fine. There is also a tutorial that you can follow:
https://mxnet.apache.org/api/python/docs/tutorials/deploy/inference/image_classification_jetson.html
This tutorial also contains a link to (3rd party) MXNet wheel that you can directly use. This wheel has been built using the steps on the installation page.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-mxnet] nabulsi commented on issue #18759: Jetson: Segmentation Fault 11 When Importing MXNET
Posted by GitBox <gi...@apache.org>.
nabulsi commented on issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759#issuecomment-661461820
Hi @mseth10 . Thanks for working on the issue.
I just tried the method with the wheel that you mentioned. I also got the same error message:
```
root@user-jetson2:/home/user# python
Python 3.7.7 (default, Jun 25 2020, 13:11:10)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mxnet
Segmentation fault: 11
Segmentation fault (core dumped)
```
Did you try using python 3.7?
Another thing: for me, building MXNet from source on the Jetson Nano mostly doesn't finish. After 2.5 hours it just interrupts, although I have swap file of 6 GB :(
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-mxnet] nabulsi commented on issue #18759: Jetson: Segmentation Fault 11 When Importing MXNET
Posted by GitBox <gi...@apache.org>.
nabulsi commented on issue #18759:
URL: https://github.com/apache/incubator-mxnet/issues/18759#issuecomment-662225505
**This is my update: Problem is not happening any more.**
I used a fresh image provided by NVidia for my Nano and then went through all the steps again (installed python3.7, installed dependencies, etc..). Then I used the wheel mentioned above to install MXNet 1.6:
```
wget https://mxnet-public.s3.us-east-2.amazonaws.com/install/jetson/1.6.0/mxnet_cu102-1.6.0-py2.py3-none-linux_aarch64.whl
sudo pip3 install mxnet_cu102-1.6.0-py2.py3-none-linux_aarch64.whl
```
After that I compiled OpenCV 4.4.0 from source. The segmentation errors are not happening any more.
That said, I need your guidance please with one thing please:
I tried to cross compile MXNET 2.0.0 as described [here](https://mxnet.apache.org/get_started/jetson_setup), i.e. by using Docker on an EC2 instance (P3) with Deep Learning AMI :
`$MXNET_HOME/ci/build.py -p jetson`
While I was able to import the generated library on the Nano, I received errors when trying to work with it:
```
import mxnet as mx
a = mx.nd.ones((2, 3), mx.gpu())
b = a * 2 + 1
b.asnumpy()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jetson/mxnet/python/mxnet/ndarray/ndarray.py", line 2570, in asnumpy
ctypes.c_size_t(data.size)))
File "/home/jetson/mxnet/python/mxnet/base.py", line 246, in check_call
raise get_last_ffi_error()
mxnet.base.MXNetError: Traceback (most recent call last):
File "/work/mxnet/src/operator/numpy/../tensor/../../common/../operator/mxnet_op.h", line 1132
Name: Check failed: err == cudaSuccess (209 vs. 0) : mxnet_generic_kernel ErrStr:no kernel image is available for execution on the device
```
Upon running `cuobjdump /path/to/libmxnet.so` I noticed that the architecture is showing as `arch = sm_52`, whereas as we know the Nano has `sm_53`
How can I cross compile for the Nano on an EC2 instance?
Thanks!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org