You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/07/08 20:04:11 UTC
[GitHub] [incubator-mxnet] avivna opened a new issue #15486: mxnet profiler
yields exception when multithreaded mxndarray IO operations occur in the
background
avivna opened a new issue #15486: mxnet profiler yields exception when multithreaded mxndarray IO operations occur in the background
URL: https://github.com/apache/incubator-mxnet/issues/15486
## Description:
Activation of the mxnet profiler when mx.ndarray io operate in the background result in the following exception:
Check failed: !thread_profiling_data.calls_.empty()
A short script that recreates this exception is attached to this issue:
The script's flow:
[1] Predefined mx.ndarrays (8 arrays by default) are saved to disk
[2] 8 threads are invoked, each thread infinitely load's one of the arrays defined in [1] from disk.
[3] After threads are invoked, main thread activates the mxnet profiler iteratively (i.e. a small code section which include simple mx.ndarray operation is profiled n=20 times)
[4] After profiling ends, all threads are stopped and joined to the main thread.
In practice, during one of the first few iterations in [3], the following exception is thrown right after the 'mx.profiler.set_state('run')' command:
terminate called after throwing an instance of 'dmlc::Error'
what(): [12:45:00] /local/p4clients/pkgbuild-WfMlS/workspace/build/IhmMXNet/IhmMXNet-1.x.1530.0/AL2012/DEV.STD.PTHREAD/build/private/src/src/c_api/c_api_profile.cc:145: Check failed: !thread_profiling_data.calls_.empty()
## Environment info (Required)
Python 2.7.8 (default, Apr 28 2019, 07:40:25)
[GCC 4.9.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> execfile('diagnose.py')
----------Python Info----------
('Version :', '2.7.8')
('Compiler :', 'GCC 4.9.4')
('Build :', ('default', 'Apr 28 2019 07:40:25'))
('Arch :', ('64bit', 'ELF'))
------------Pip Info-----------
No corresponding pip install for current python.
----------MXNet Info-----------
('Version :', '1.4.1')
('Directory :', '/home/avivna/workplace/IhmDSATrainingGluon/build/IhmDSATraining/IhmDSATraining-1.0/AL2012/DEV.STD.PTHREAD/build/private/tmp/brazil-path/testrun.runtimefarm/lib/python2.7/site-packages/mxnet')
Hashtag not found. Not installed from pre-built package.
----------System Info----------
('Platform :', 'Linux-4.9.152-0.1.ac.221.79.329.metal1.x86_64-x86_64-with-redhat-5.3-Tikanga')
('system :', 'Linux')
('node :', 'ihm-training-p2-2b-b009d323.us-west-2.amazon.com')
('release :', '4.9.152-0.1.ac.221.79.329.metal1.x86_64')
('version :', '#1 SMP Thu Mar 14 16:47:05 UTC 2019')
----------Hardware Info----------
('machine :', 'x86_64')
('processor :', 'x86_64')
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Stepping: 1
CPU MHz: 2708.789
BogoMIPS: 4600.08
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 46080K
NUMA node0 CPU(s): 0-31
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0093 sec, LOAD: 0.6038 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0186 sec, LOAD: 0.3516 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0440 sec, LOAD: 0.2027 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0203 sec, LOAD: 0.0903 sec.
Error open Gluon Tutorial(en): http://gluon.mxnet.io, <urlopen error [Errno 1] _ssl.c:510: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure>, DNS finished in 0.0548281669617 sec.
Error open Gluon Tutorial(cn): https://zh.gluon.ai, <urlopen error [Errno 1] _ssl.c:510: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure>, DNS finished in 0.697346925735 sec.
Package used (Python):
I'm using mxnet, time, os, threading
## Build info (Required if built from source)
Compiler: GCC 4.9.4
MXNet commit hash:
('Version :', '1.4.1')
Build config:
(Paste the content of config.mk, or the build command.)
## Error Message:
terminate called after throwing an instance of 'dmlc::Error'
what(): [12:45:00] /local/p4clients/pkgbuild-WfMlS/workspace/build/IhmMXNet/IhmMXNet-1.x.1530.0/AL2012/DEV.STD.PTHREAD/build/private/src/src/c_api/c_api_profile.cc:145: Check failed: !thread_profiling_data.calls_.empty()
Stack trace returned 10 entries:
[bt] (0) /home/avivna/workplace/IhmDSATrainingGluon/build/IhmDSATraining/IhmDSATraining-1.0/AL2012/DEV.STD.PTHREAD/build/private/tmp/brazil-path/testrun.runtimefarm/lib/python2.7/site-packages/mxnet/../../../libmxnet.so(dmlc::StackTrace()+0x3d) [0x7f192c987edd]
[bt] (1) /home/avivna/workplace/IhmDSATrainingGluon/build/IhmDSATraining/IhmDSATraining-1.0/AL2012/DEV.STD.PTHREAD/build/private/tmp/brazil-path/testrun.runtimefarm/lib/python2.7/site-packages/mxnet/../../../libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x1a) [0x7f192c98823a]
[bt] (2) /home/avivna/workplace/IhmDSATrainingGluon/build/IhmDSATraining/IhmDSATraining-1.0/AL2012/DEV.STD.PTHREAD/build/private/tmp/brazil-path/testrun.runtimefarm/lib/python2.7/site-packages/mxnet/../../../libmxnet.so(mxnet::on_exit_api()+0x4b7) [0x7f192c9bb327]
[bt] (3) /home/avivna/workplace/IhmDSATrainingGluon/build/IhmDSATraining/IhmDSATraining-1.0/AL2012/DEV.STD.PTHREAD/build/private/tmp/brazil-path/testrun.runtimefarm/lib/python2.7/site-packages/mxnet/../../../libmxnet.so(MXNDArrayLoad+0x3ea) [0x7f192c98202a]
[bt] (4) /home/avivna/brazil-pkg-cache/packages/Python27/Python27-1.0.252183.0/AL2012/DEV.STD.PTHREAD/build/lib/python2.7/lib-dynload/_ctypes.so(ffi_call_unix64+0x4c) [0x7f19a4c12320]
[bt] (5) /home/avivna/brazil-pkg-cache/packages/Python27/Python27-1.0.252183.0/AL2012/DEV.STD.PTHREAD/build/lib/python2.7/lib-dynload/_ctypes.so(ffi_call+0x148) [0x7f19a4c11498]
[bt] (6) /home/avivna/brazil-pkg-cache/packages/Python27/Python27-1.0.252183.0/AL2012/DEV.STD.PTHREAD/build/lib/python2.7/lib-dynload/_ctypes.so(_ctypes_callproc+0x292) [0x7f19a4c090f2]
[bt] (7) /home/avivna/brazil-pkg-cache/packages/Python27/Python27-1.0.252183.0/AL2012/DEV.STD.PTHREAD/build/lib/python2.7/lib-dynload/_ctypes.so(+0x9fe4) [0x7f19a4bfffe4]
[bt] (8) /home/avivna/workplace/IhmDSATrainingGluon/build/IhmDSATraining/IhmDSATraining-1.0/AL2012/DEV.STD.PTHREAD/build/private/tmp/brazil-path/testrun.runtimefarm/lib/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f19a66f2483]
[bt] (9) /home/avivna/workplace/IhmDSATrainingGluon/build/IhmDSATraining/IhmDSATraining-1.0/AL2012/DEV.STD.PTHREAD/build/private/tmp/brazil-path/testrun.runtimefarm/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x3b66) [0x7f19a67a6466]
## Minimum reproducible example
(If you are using your own code, please provide a short script that reproduces the error. Otherwise, please provide link to the existing example.)
## Steps to reproduce
See attached script
Code was ran on python 2.7.
1. Activate python 2.7 interpreter
2.Run: execfile('mxnet_profiling_exception.py')
## What have you tried to solve it?
1. The issue is not recreated if all mxnet io operations are stopped before profiler is activated.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services