You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/07/08 20:04:11 UTC

[GitHub] [incubator-mxnet] avivna opened a new issue #15486: mxnet profiler yields exception when multithreaded mxndarray IO operations occur in the background

avivna opened a new issue #15486: mxnet profiler yields exception when multithreaded mxndarray IO operations occur in the background 
URL: https://github.com/apache/incubator-mxnet/issues/15486
 
 
   ## Description:
   Activation of the mxnet profiler when mx.ndarray io operate in the background result in the following exception:
   Check failed: !thread_profiling_data.calls_.empty()
   
   A short script that recreates this exception is attached to this issue:
   The script's flow:
   [1] Predefined mx.ndarrays (8 arrays by default) are saved to disk
   [2] 8 threads are invoked, each thread infinitely load's one of the arrays defined in [1] from disk.
   [3] After threads are invoked, main thread activates the mxnet profiler iteratively (i.e. a small code section which include simple mx.ndarray operation is profiled n=20 times)
   [4] After profiling ends, all threads are stopped and joined to the main thread. 
   
   In practice, during one of the first few iterations in [3], the following exception is thrown right after the 'mx.profiler.set_state('run')' command:
   
   terminate called after throwing an instance of 'dmlc::Error'
     what():  [12:45:00] /local/p4clients/pkgbuild-WfMlS/workspace/build/IhmMXNet/IhmMXNet-1.x.1530.0/AL2012/DEV.STD.PTHREAD/build/private/src/src/c_api/c_api_profile.cc:145: Check failed: !thread_profiling_data.calls_.empty() 
   
        
   ## Environment info (Required)
   
   Python 2.7.8 (default, Apr 28 2019, 07:40:25) 
   [GCC 4.9.4] on linux2
   Type "help", "copyright", "credits" or "license" for more information.
   >>> execfile('diagnose.py')
   ----------Python Info----------
   ('Version      :', '2.7.8')
   ('Compiler     :', 'GCC 4.9.4')
   ('Build        :', ('default', 'Apr 28 2019 07:40:25'))
   ('Arch         :', ('64bit', 'ELF'))
   ------------Pip Info-----------
   No corresponding pip install for current python.
   ----------MXNet Info-----------
   ('Version      :', '1.4.1')
   ('Directory    :', '/home/avivna/workplace/IhmDSATrainingGluon/build/IhmDSATraining/IhmDSATraining-1.0/AL2012/DEV.STD.PTHREAD/build/private/tmp/brazil-path/testrun.runtimefarm/lib/python2.7/site-packages/mxnet')
   Hashtag not found. Not installed from pre-built package.
   ----------System Info----------
   ('Platform     :', 'Linux-4.9.152-0.1.ac.221.79.329.metal1.x86_64-x86_64-with-redhat-5.3-Tikanga')
   ('system       :', 'Linux')
   ('node         :', 'ihm-training-p2-2b-b009d323.us-west-2.amazon.com')
   ('release      :', '4.9.152-0.1.ac.221.79.329.metal1.x86_64')
   ('version      :', '#1 SMP Thu Mar 14 16:47:05 UTC 2019')
   ----------Hardware Info----------
   ('machine      :', 'x86_64')
   ('processor    :', 'x86_64')
   Architecture:          x86_64
   CPU op-mode(s):        32-bit, 64-bit
   Byte Order:            Little Endian
   CPU(s):                32
   On-line CPU(s) list:   0-31
   Thread(s) per core:    2
   Core(s) per socket:    16
   Socket(s):             1
   NUMA node(s):          1
   Vendor ID:             GenuineIntel
   CPU family:            6
   Model:                 79
   Stepping:              1
   CPU MHz:               2708.789
   BogoMIPS:              4600.08
   Hypervisor vendor:     Xen
   Virtualization type:   full
   L1d cache:             32K
   L1i cache:             32K
   L2 cache:              256K
   L3 cache:              46080K
   NUMA node0 CPU(s):     0-31
   ----------Network Test----------
   Setting timeout: 10
   Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0093 sec, LOAD: 0.6038 sec.
   Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0186 sec, LOAD: 0.3516 sec.
   Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0440 sec, LOAD: 0.2027 sec.
   Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0203 sec, LOAD: 0.0903 sec.
   Error open Gluon Tutorial(en): http://gluon.mxnet.io, <urlopen error [Errno 1] _ssl.c:510: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure>, DNS finished in 0.0548281669617 sec.
   Error open Gluon Tutorial(cn): https://zh.gluon.ai, <urlopen error [Errno 1] _ssl.c:510: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure>, DNS finished in 0.697346925735 sec.
   
   
   Package used (Python):
   I'm using mxnet, time, os, threading
   
   ## Build info (Required if built from source)
   
   Compiler: GCC 4.9.4
   MXNet commit hash:
   ('Version      :', '1.4.1')
   
   Build config:
   (Paste the content of config.mk, or the build command.)
   
   ## Error Message:
   terminate called after throwing an instance of 'dmlc::Error'
     what():  [12:45:00] /local/p4clients/pkgbuild-WfMlS/workspace/build/IhmMXNet/IhmMXNet-1.x.1530.0/AL2012/DEV.STD.PTHREAD/build/private/src/src/c_api/c_api_profile.cc:145: Check failed: !thread_profiling_data.calls_.empty() 
   
   Stack trace returned 10 entries:
   [bt] (0) /home/avivna/workplace/IhmDSATrainingGluon/build/IhmDSATraining/IhmDSATraining-1.0/AL2012/DEV.STD.PTHREAD/build/private/tmp/brazil-path/testrun.runtimefarm/lib/python2.7/site-packages/mxnet/../../../libmxnet.so(dmlc::StackTrace()+0x3d) [0x7f192c987edd]                           
   [bt] (1) /home/avivna/workplace/IhmDSATrainingGluon/build/IhmDSATraining/IhmDSATraining-1.0/AL2012/DEV.STD.PTHREAD/build/private/tmp/brazil-path/testrun.runtimefarm/lib/python2.7/site-packages/mxnet/../../../libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x1a) [0x7f192c98823a]    
   [bt] (2) /home/avivna/workplace/IhmDSATrainingGluon/build/IhmDSATraining/IhmDSATraining-1.0/AL2012/DEV.STD.PTHREAD/build/private/tmp/brazil-path/testrun.runtimefarm/lib/python2.7/site-packages/mxnet/../../../libmxnet.so(mxnet::on_exit_api()+0x4b7) [0x7f192c9bb327]                        
   [bt] (3) /home/avivna/workplace/IhmDSATrainingGluon/build/IhmDSATraining/IhmDSATraining-1.0/AL2012/DEV.STD.PTHREAD/build/private/tmp/brazil-path/testrun.runtimefarm/lib/python2.7/site-packages/mxnet/../../../libmxnet.so(MXNDArrayLoad+0x3ea) [0x7f192c98202a]                               
   [bt] (4) /home/avivna/brazil-pkg-cache/packages/Python27/Python27-1.0.252183.0/AL2012/DEV.STD.PTHREAD/build/lib/python2.7/lib-dynload/_ctypes.so(ffi_call_unix64+0x4c) [0x7f19a4c12320]                                                                                                         
   [bt] (5) /home/avivna/brazil-pkg-cache/packages/Python27/Python27-1.0.252183.0/AL2012/DEV.STD.PTHREAD/build/lib/python2.7/lib-dynload/_ctypes.so(ffi_call+0x148) [0x7f19a4c11498]                                                                                                               
   [bt] (6) /home/avivna/brazil-pkg-cache/packages/Python27/Python27-1.0.252183.0/AL2012/DEV.STD.PTHREAD/build/lib/python2.7/lib-dynload/_ctypes.so(_ctypes_callproc+0x292) [0x7f19a4c090f2]                                                                                                       
   [bt] (7) /home/avivna/brazil-pkg-cache/packages/Python27/Python27-1.0.252183.0/AL2012/DEV.STD.PTHREAD/build/lib/python2.7/lib-dynload/_ctypes.so(+0x9fe4) [0x7f19a4bfffe4]                                                                                                                      
   [bt] (8) /home/avivna/workplace/IhmDSATrainingGluon/build/IhmDSATraining/IhmDSATraining-1.0/AL2012/DEV.STD.PTHREAD/build/private/tmp/brazil-path/testrun.runtimefarm/lib/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f19a66f2483]                                                               
   [bt] (9) /home/avivna/workplace/IhmDSATrainingGluon/build/IhmDSATraining/IhmDSATraining-1.0/AL2012/DEV.STD.PTHREAD/build/private/tmp/brazil-path/testrun.runtimefarm/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x3b66) [0x7f19a67a6466]
       
   ## Minimum reproducible example
   (If you are using your own code, please provide a short script that reproduces the error. Otherwise, please provide link to the existing example.)
   
   ## Steps to reproduce
   See attached script
   
   Code was ran on python 2.7. 
   1. Activate python 2.7 interpreter 
   2.Run: execfile('mxnet_profiling_exception.py')
   
   ## What have you tried to solve it?
   
   1. The issue is not recreated if all mxnet io operations are stopped before profiler is activated.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services