You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/03/15 20:10:50 UTC
[GitHub] [incubator-mxnet] Shumakriss opened a new issue #14440: Seg fault
during mxnet.Convolution() after reshape()
Shumakriss opened a new issue #14440: Seg fault during mxnet.Convolution() after reshape()
URL: https://github.com/apache/incubator-mxnet/issues/14440
## Description
I get a seg fault when reshaping before Convolution() even though the input appears the same.
## Environment info (Required)
macOS Mojave
Version 10.14.3
Processor Intel Core i7
Memory 16GB
Graphics Radeon Pro 455 2048 MB
diagnose.py output:
----------Python Info----------
Version : 3.6.8
Compiler : GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)
Build : ('default', 'Dec 29 2018 19:04:46')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 19.0.3
Directory : /anaconda3/envs/mxnet-wtf-3.7/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version : 1.2.1
Directory : /anaconda3/envs/mxnet-wtf-3.7/lib/python3.6/site-packages/mxnet
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform : Darwin-18.2.0-x86_64-i386-64bit
system : Darwin
node : irbt-7046
release : 18.2.0
version : Darwin Kernel Version 18.2.0: Thu Dec 20 20:46:53 PST 2018; root:xnu-4903.241.1~1/RELEASE_X86_64
----------Hardware Info----------
machine : x86_64
processor : i386
b'machdep.cpu.brand_string: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz'
b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE AVX2 BMI2 INVPCID RTM SMAP RDSEED ADX IPT SGX FPU_CSDS MPX CLFSOPT'
b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI'
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0112 sec, LOAD: 0.7117 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0317 sec, LOAD: 0.3739 sec.
Error open Gluon Tutorial(cn): https://zh.gluon.ai, <urlopen error _ssl.c:835: The handshake operation timed out>, DNS finished in 0.02411198616027832 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0168 sec, LOAD: 0.5562 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0119 sec, LOAD: 0.1559 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0116 sec, LOAD: 6.0567 sec.
Python 3.6.8 :: Anaconda, Inc.
conda list output:
Name Version Build Channel
_mutex_mxnet 0.0.40 mkl
asn1crypto 0.24.0 py36_0
blas 1.0 mkl
bzip2 1.0.6 h1de35cc_5
ca-certificates 2019.1.23 0 anaconda
cairo 1.14.12 hc4e6be7_4
certifi 2019.3.9 py36_0 anaconda
cffi 1.12.2 py36hb5b8e2f_1
chardet 3.0.4 py36_1
cryptography 2.6.1 py36ha12b0ac_0
cycler 0.10.0 py36hfc81398_0
ffmpeg 3.4 h8a2ae75_0
fontconfig 2.13.0 h5d5b041_1
freetype 2.9.1 hb4e5f40_0
gettext 0.19.8.1 h15daf44_3
glib 2.56.2 hd9629dc_0
graphite2 1.3.13 h2098e52_0
harfbuzz 1.8.8 hb8d4a28_0
hdf5 1.10.2 hfa1e0ec_1
icu 58.2 h4b95b61_1
idna 2.6 py36h8628d0a_1
imutils 0.5.2 <pip>
intel-openmp 2018.0.3 0
jasper 2.0.14 h636a363_1
jpeg 9b he5867d9_2
kiwisolver 1.0.1 py36h0a44026_0
libcxx 4.0.1 hcfea43d_1
libcxxabi 4.0.1 hcfea43d_1
libedit 3.1.20181209 hb402a30_0
libffi 3.2.1 h475c297_4
libgfortran 3.0.1 h93005f0_2
libiconv 1.15 hdd342a3_7
libmklml 2018.0.3 0
libmxnet 1.2.1 mkl_hbefbd7c_1
libopencv 3.4.1 h14a57ad_3
libopus 1.3 h1de35cc_0
libpng 1.6.36 ha441bb4_0
libprotobuf 3.5.2 h2cd40f5_0
libtiff 4.0.10 hcb84e12_2
libvpx 1.7.0 h378b8a2_0
libxml2 2.9.9 hab757c2_0
matplotlib 3.0.2 py36h54f8f79_0 anaconda
mkl 2019.1 144
mkl-dnn 0.14 h04f5b5a_0
mxnet 1.2.1 h8cc8929_0
ncurses 6.1 h0a44026_1
numpy 1.14.2 py36ha9ae307_0
olefile 0.46 py36_0
opencv 3.4.1 py36h6fd60c2_3 anaconda
openssl 1.1.1 h1de35cc_0 anaconda
pandas 0.24.2 <pip>
pcre 8.43 h0a44026_0
pillow 5.4.1 py36hb68e598_0 anaconda
pip 19.0.3 py36_0 anaconda
pixman 0.38.0 h1de35cc_0
py-mxnet 1.2.1 py36h4bd7469_0
py-opencv 3.4.1 py36h4b7e113_3
pycparser 2.19 py36_0
pyopenssl 19.0.0 py36_0
pyparsing 2.3.1 py36_0
pysocks 1.6.8 py36_0
python 3.6.8 haf84260_0
python-dateutil 2.8.0 py36_0
pytz 2018.9 py36_0
readline 7.0 h1de35cc_5
requests 2.18.4 py36h4516966_1
scipy 1.2.1 py36h1410ff5_0 anaconda
setuptools 40.8.0 py36_0
six 1.12.0 py36_0
sqlite 3.27.2 ha441bb4_0
tk 8.6.8 ha441bb4_0
tornado 6.0.1 py36h1de35cc_0
urllib3 1.22 py36h68b9469_0
wheel 0.33.1 py36_0
xz 5.2.4 h1de35cc_4
zlib 1.2.11 h1de35cc_3
zstd 1.3.7 h5bba6e5_0
## Error Message:
Segmentation fault: 11
Stack trace returned 10 entries:
[bt] (0) 0 libmxnet.so 0x00000001074a4591 dmlc::StackTrace() + 305
[bt] (1) 1 libmxnet.so 0x0000000108afc326 mxnet::SegfaultLogger(int) + 54
[bt] (2) 2 libsystem_platform.dylib 0x00007fff5b9c2b3d _sigtramp + 29
[bt] (3) 3 libsystem_malloc.dylib 0x00007fff5b9884a4 tiny_malloc_from_free_list + 445
[bt] (4) 4 libmxnet.so 0x00000001074c4bc0 mxnet::GetWeights(mxnet::NDArray const&, mkldnn::memory::primitive_desc const&, int) + 48
[bt] (5) 5 libmxnet.so 0x00000001074d497d mxnet::op::MKLDNNConvolutionForward(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::__1::vector<mxnet::NDArray, std::__1::allocator<mxnet::NDArray> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, std::__1::vector<mxnet::NDArray, std::__1::allocator<mxnet::NDArray> > const&) + 2429
[bt] (6) 6 libmxnet.so 0x00000001076fd810 mxnet::op::ConvolutionComputeExCPU(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::__1::vector<mxnet::NDArray, std::__1::allocator<mxnet::NDArray> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, std::__1::vector<mxnet::NDArray, std::__1::allocator<mxnet::NDArray> > const&) + 240
[bt] (7) 7 libmxnet.so 0x0000000108781f4c mxnet::imperative::PushFComputeEx(std::__1::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::__1::vector<mxnet::NDArray, std::__1::allocator<mxnet::NDArray> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, std::__1::vector<mxnet::NDArray, std::__1::allocator<mxnet::NDArray> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::__1::vector<mxnet::engine::Var*, std::__1::allocator<mxnet::engine::Var*> > const&, std::__1::vector<mxnet::engine::Var*, std::__1::allocator<mxnet::engine::Var*> > const&, std::__1::vector<mxnet::Resource, std::__1::allocator<mxnet::Resource> > const&, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> > const&, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&)::'lambda'(mxnet::RunContext)::operator()(mxnet::RunContext) const + 236
[bt] (8) 8 libmxnet.so 0x000000010872b4b0 std::__1::__function::__func<mxnet::engine::ThreadedEngine::PushSync(std::__1::function<void (mxnet::RunContext)>, mxnet::Context, std::__1::vector<mxnet::engine::Var*, std::__1::allocator<mxnet::engine::Var*> > const&, std::__1::vector<mxnet::engine::Var*, std::__1::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*)::$_1, std::__1::allocator<mxnet::engine::ThreadedEngine::PushSync(std::__1::function<void (mxnet::RunContext)>, mxnet::Context, std::__1::vector<mxnet::engine::Var*, std::__1::allocator<mxnet::engine::Var*> > const&, std::__1::vector<mxnet::engine::Var*, std::__1::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*)::$_1>, void (mxnet::RunContext, mxnet::engine::CallbackOnComplete)>::operator()(mxnet::RunContext&&, mxnet::engine::CallbackOnComplete&&) + 64
[bt] (9) 9 libmxnet.so 0x000000010871ef27 mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*) + 727
## Minimum reproducible example
import os
import mxnet as mx
import numpy as np
import time
import pickle
data = mx.nd.zeros(shape=(1, 1, 162, 94)).astype(np.float32)
weight = mx.nd.zeros(shape=(2, 1, 181, 181)).astype(np.float32)
i = mx.nd.zeros(shape=(162, 94)).astype(np.float32)
k = mx.nd.zeros(shape=(2, 181, 181)).astype(np.float32)
reshaped_data = i.reshape([1, 1, 162, 94])
reshaped_weight = k.reshape([2, 1, 181, 181])
with open("data.txt", 'w+') as data_file:
data_file.write(str(pickle.dumps(data, protocol=0)))
with open("reshape.txt", 'w+') as data_file:
data_file.write(str(pickle.dumps(reshaped_data, protocol=0)))
mx.nd.Convolution(data=reshaped_data,
weight=reshaped_weight,
kernel=(181, 181),
num_filter=2,
no_bias=True,
stride=(1, 1),
pad=(90, 90))
time.sleep(5)
## Steps to reproduce
(Paste the commands you ran that produced the error.)
1. Set up conda environment with YAML file:
name: mxnet-env
channels:
- anaconda
- conda-forge
dependencies:
- mxnet
- pillow
- matplotlib
- scipy
- opencv
- pip
- pip:
- pandas
- imutils
2. conda activate mxnet-env
3. python test.py
4. You can try using the "data" variable to see that it does not seg fault when not reshaping
## What have you tried to solve it?
1. I tried a lot of versions (admittedly, somewhat haphazardly) namely 1.2.1 and 1.4 on Mac, on AWS's Ubuntu 16 AMI and their Ubuntu-based deep learning AMI
2. Made the minimal reproducible code snippet, issue appears to happen on reshape even though output data.txt and reshape.txt seem identical
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services