You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/03/15 20:10:50 UTC

[GitHub] [incubator-mxnet] Shumakriss opened a new issue #14440: Seg fault during mxnet.Convolution() after reshape()

Shumakriss opened a new issue #14440: Seg fault during mxnet.Convolution() after reshape()
URL: https://github.com/apache/incubator-mxnet/issues/14440
 
 
   ## Description
   I get a seg fault when reshaping before Convolution() even though the input appears the same.
   
   ## Environment info (Required)
   macOS Mojave
   Version 10.14.3
   Processor Intel Core i7
   Memory 16GB
   Graphics Radeon Pro 455 2048 MB
   
   
   diagnose.py output:
   ----------Python Info----------
   Version      : 3.6.8
   Compiler     : GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)
   Build        : ('default', 'Dec 29 2018 19:04:46')
   Arch         : ('64bit', '')
   ------------Pip Info-----------
   Version      : 19.0.3
   Directory    : /anaconda3/envs/mxnet-wtf-3.7/lib/python3.6/site-packages/pip
   ----------MXNet Info-----------
   Version      : 1.2.1
   Directory    : /anaconda3/envs/mxnet-wtf-3.7/lib/python3.6/site-packages/mxnet
   Hashtag not found. Not installed from pre-built package.
   ----------System Info----------
   Platform     : Darwin-18.2.0-x86_64-i386-64bit
   system       : Darwin
   node         : irbt-7046
   release      : 18.2.0
   version      : Darwin Kernel Version 18.2.0: Thu Dec 20 20:46:53 PST 2018; root:xnu-4903.241.1~1/RELEASE_X86_64
   ----------Hardware Info----------
   machine      : x86_64
   processor    : i386
   b'machdep.cpu.brand_string: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz'
   b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
   b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE AVX2 BMI2 INVPCID RTM SMAP RDSEED ADX IPT SGX FPU_CSDS MPX CLFSOPT'
   b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI'
   ----------Network Test----------
   Setting timeout: 10
   Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0112 sec, LOAD: 0.7117 sec.
   Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0317 sec, LOAD: 0.3739 sec.
   Error open Gluon Tutorial(cn): https://zh.gluon.ai, <urlopen error _ssl.c:835: The handshake operation timed out>, DNS finished in 0.02411198616027832 sec.
   Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0168 sec, LOAD: 0.5562 sec.
   Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0119 sec, LOAD: 0.1559 sec.
   Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0116 sec, LOAD: 6.0567 sec.
   
   Python 3.6.8 :: Anaconda, Inc.
   
   conda list output:
   Name                    Version                   Build  Channel
   _mutex_mxnet              0.0.40                      mkl  
   asn1crypto                0.24.0                   py36_0  
   blas                      1.0                         mkl  
   bzip2                     1.0.6                h1de35cc_5  
   ca-certificates           2019.1.23                     0    anaconda
   cairo                     1.14.12              hc4e6be7_4  
   certifi                   2019.3.9                 py36_0    anaconda
   cffi                      1.12.2           py36hb5b8e2f_1  
   chardet                   3.0.4                    py36_1  
   cryptography              2.6.1            py36ha12b0ac_0  
   cycler                    0.10.0           py36hfc81398_0  
   ffmpeg                    3.4                  h8a2ae75_0  
   fontconfig                2.13.0               h5d5b041_1  
   freetype                  2.9.1                hb4e5f40_0  
   gettext                   0.19.8.1             h15daf44_3  
   glib                      2.56.2               hd9629dc_0  
   graphite2                 1.3.13               h2098e52_0  
   harfbuzz                  1.8.8                hb8d4a28_0  
   hdf5                      1.10.2               hfa1e0ec_1  
   icu                       58.2                 h4b95b61_1  
   idna                      2.6              py36h8628d0a_1  
   imutils                   0.5.2                     <pip>
   intel-openmp              2018.0.3                      0  
   jasper                    2.0.14               h636a363_1  
   jpeg                      9b                   he5867d9_2  
   kiwisolver                1.0.1            py36h0a44026_0  
   libcxx                    4.0.1                hcfea43d_1  
   libcxxabi                 4.0.1                hcfea43d_1  
   libedit                   3.1.20181209         hb402a30_0  
   libffi                    3.2.1                h475c297_4  
   libgfortran               3.0.1                h93005f0_2  
   libiconv                  1.15                 hdd342a3_7  
   libmklml                  2018.0.3                      0  
   libmxnet                  1.2.1            mkl_hbefbd7c_1  
   libopencv                 3.4.1                h14a57ad_3  
   libopus                   1.3                  h1de35cc_0  
   libpng                    1.6.36               ha441bb4_0  
   libprotobuf               3.5.2                h2cd40f5_0  
   libtiff                   4.0.10               hcb84e12_2  
   libvpx                    1.7.0                h378b8a2_0  
   libxml2                   2.9.9                hab757c2_0  
   matplotlib                3.0.2            py36h54f8f79_0    anaconda
   mkl                       2019.1                      144  
   mkl-dnn                   0.14                 h04f5b5a_0  
   mxnet                     1.2.1                h8cc8929_0  
   ncurses                   6.1                  h0a44026_1  
   numpy                     1.14.2           py36ha9ae307_0  
   olefile                   0.46                     py36_0  
   opencv                    3.4.1            py36h6fd60c2_3    anaconda
   openssl                   1.1.1                h1de35cc_0    anaconda
   pandas                    0.24.2                    <pip>
   pcre                      8.43                 h0a44026_0  
   pillow                    5.4.1            py36hb68e598_0    anaconda
   pip                       19.0.3                   py36_0    anaconda
   pixman                    0.38.0               h1de35cc_0  
   py-mxnet                  1.2.1            py36h4bd7469_0  
   py-opencv                 3.4.1            py36h4b7e113_3  
   pycparser                 2.19                     py36_0  
   pyopenssl                 19.0.0                   py36_0  
   pyparsing                 2.3.1                    py36_0  
   pysocks                   1.6.8                    py36_0  
   python                    3.6.8                haf84260_0  
   python-dateutil           2.8.0                    py36_0  
   pytz                      2018.9                   py36_0  
   readline                  7.0                  h1de35cc_5  
   requests                  2.18.4           py36h4516966_1  
   scipy                     1.2.1            py36h1410ff5_0    anaconda
   setuptools                40.8.0                   py36_0  
   six                       1.12.0                   py36_0  
   sqlite                    3.27.2               ha441bb4_0  
   tk                        8.6.8                ha441bb4_0  
   tornado                   6.0.1            py36h1de35cc_0  
   urllib3                   1.22             py36h68b9469_0  
   wheel                     0.33.1                   py36_0  
   xz                        5.2.4                h1de35cc_4  
   zlib                      1.2.11               h1de35cc_3  
   zstd                      1.3.7                h5bba6e5_0  
   
   ## Error Message:
   Segmentation fault: 11
   
   Stack trace returned 10 entries:
   [bt] (0) 0   libmxnet.so                         0x00000001074a4591 dmlc::StackTrace() + 305
   [bt] (1) 1   libmxnet.so                         0x0000000108afc326 mxnet::SegfaultLogger(int) + 54
   [bt] (2) 2   libsystem_platform.dylib            0x00007fff5b9c2b3d _sigtramp + 29
   [bt] (3) 3   libsystem_malloc.dylib              0x00007fff5b9884a4 tiny_malloc_from_free_list + 445
   [bt] (4) 4   libmxnet.so                         0x00000001074c4bc0 mxnet::GetWeights(mxnet::NDArray const&, mkldnn::memory::primitive_desc const&, int) + 48
   [bt] (5) 5   libmxnet.so                         0x00000001074d497d mxnet::op::MKLDNNConvolutionForward(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::__1::vector<mxnet::NDArray, std::__1::allocator<mxnet::NDArray> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, std::__1::vector<mxnet::NDArray, std::__1::allocator<mxnet::NDArray> > const&) + 2429
   [bt] (6) 6   libmxnet.so                         0x00000001076fd810 mxnet::op::ConvolutionComputeExCPU(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::__1::vector<mxnet::NDArray, std::__1::allocator<mxnet::NDArray> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, std::__1::vector<mxnet::NDArray, std::__1::allocator<mxnet::NDArray> > const&) + 240
   [bt] (7) 7   libmxnet.so                         0x0000000108781f4c mxnet::imperative::PushFComputeEx(std::__1::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::__1::vector<mxnet::NDArray, std::__1::allocator<mxnet::NDArray> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, std::__1::vector<mxnet::NDArray, std::__1::allocator<mxnet::NDArray> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::__1::vector<mxnet::engine::Var*, std::__1::allocator<mxnet::engine::Var*> > const&, std::__1::vector<mxnet::engine::Var*, std::__1::allocator<mxnet::engine::Var*> > const&, std::__1::vector<mxnet::Resource, std::__1::allocator<mxnet::Resource> > const&, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> > const&, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&)::'lambda'(mxnet::RunContext)::operator()(mxnet::RunContext) const + 236
   [bt] (8) 8   libmxnet.so                         0x000000010872b4b0 std::__1::__function::__func<mxnet::engine::ThreadedEngine::PushSync(std::__1::function<void (mxnet::RunContext)>, mxnet::Context, std::__1::vector<mxnet::engine::Var*, std::__1::allocator<mxnet::engine::Var*> > const&, std::__1::vector<mxnet::engine::Var*, std::__1::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*)::$_1, std::__1::allocator<mxnet::engine::ThreadedEngine::PushSync(std::__1::function<void (mxnet::RunContext)>, mxnet::Context, std::__1::vector<mxnet::engine::Var*, std::__1::allocator<mxnet::engine::Var*> > const&, std::__1::vector<mxnet::engine::Var*, std::__1::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*)::$_1>, void (mxnet::RunContext, mxnet::engine::CallbackOnComplete)>::operator()(mxnet::RunContext&&, mxnet::engine::CallbackOnComplete&&) + 64
   [bt] (9) 9   libmxnet.so                         0x000000010871ef27 mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*) + 727
   
   
   ## Minimum reproducible example
   import os
   import mxnet as mx
   import numpy as np
   import time
   import pickle
   
   data = mx.nd.zeros(shape=(1, 1, 162, 94)).astype(np.float32)
   weight = mx.nd.zeros(shape=(2, 1, 181, 181)).astype(np.float32)
   
   i = mx.nd.zeros(shape=(162, 94)).astype(np.float32)
   k = mx.nd.zeros(shape=(2, 181, 181)).astype(np.float32)
   
   reshaped_data = i.reshape([1, 1, 162, 94])
   reshaped_weight = k.reshape([2, 1, 181, 181])
   
   with open("data.txt", 'w+') as data_file:
       data_file.write(str(pickle.dumps(data, protocol=0)))
   
   with open("reshape.txt", 'w+') as data_file:
       data_file.write(str(pickle.dumps(reshaped_data, protocol=0)))
   
   mx.nd.Convolution(data=reshaped_data,
                     weight=reshaped_weight,
                     kernel=(181, 181),
                     num_filter=2,
                     no_bias=True,
                     stride=(1, 1),
                     pad=(90, 90))
   
   time.sleep(5)
   
   
   ## Steps to reproduce
   (Paste the commands you ran that produced the error.)
   
   1. Set up conda environment with YAML file:
   name: mxnet-env
   channels:
    - anaconda
    - conda-forge
   dependencies:
    - mxnet 
    - pillow
    - matplotlib
    - scipy
    - opencv
    - pip
    - pip:
      - pandas
      - imutils
   2. conda activate mxnet-env
   3. python test.py
   4. You can try using the "data" variable to see that it does not seg fault when not reshaping
   
   ## What have you tried to solve it?
   1. I tried a lot of versions (admittedly, somewhat haphazardly) namely 1.2.1 and 1.4 on Mac, on AWS's Ubuntu 16 AMI and their Ubuntu-based deep learning AMI
   2. Made the minimal reproducible code snippet, issue appears to happen on reshape even though output data.txt and reshape.txt seem identical
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services