You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/04/11 09:19:13 UTC

[GitHub] [incubator-mxnet] xinyu-intel commented on issue #18014: enabling mkldnn leads to segfault in bytePS

xinyu-intel commented on issue #18014: enabling mkldnn leads to segfault in bytePS
URL: https://github.com/apache/incubator-mxnet/issues/18014#issuecomment-612375492
 
 
   Build latest MXNet w/o MKLDNN also encounter this issue:
   ```
   cmake -DCMAKE_BUILD_TYPE=Debug -DUSE_MKL_IF_AVAILABLE=OFF -DUSE_CUDA=ON -DUSE_MKLDNN=OFF -G Ninja ..
   ```
   ```
   Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
   __GI___pthread_mutex_lock (mutex=0x3a6772617f) at ../nptl/pthread_mutex_lock.c:65
   65	../nptl/pthread_mutex_lock.c: No such file or directory.
   #0  __GI___pthread_mutex_lock (mutex=0x3a6772617f) at ../nptl/pthread_mutex_lock.c:65
   #1  0x00007fa43648b65b in __gthread_mutex_lock (__mutex=0x3a6772617f) at /usr/include/x86_64-linux-gnu/c++/7/bits/gthr-default.h:748
   #2  0x00007fa4364adf3a in std::mutex::lock (this=0x3a6772617f) at /usr/include/c++/7/bits/std_mutex.h:103
   #3  0x00007fa4364c5bf4 in std::lock_guard<std::mutex>::lock_guard (this=0x7ffd3340e270, __m=...) at /usr/include/c++/7/bits/std_mutex.h:162
   #4  0x00007fa4366b2d6e in mxnet::engine::ThreadedVar::AppendWriteDependency (this=0x3a6772615f, opr_block=0x2f08190) at ../src/engine/threaded_engine.cc:74
   #5  0x00007fa4366af4f7 in mxnet::engine::ThreadedEngine::Push (this=0x2f053a0, op=0x2f06630, exec_ctx=..., priority=0, profiling=false) at ../src/engine/threaded_engine.cc:311
   #6  0x00007fa4366af924 in mxnet::engine::ThreadedEngine::PushAsync(std::function<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*, bool) (this=0x2f053a0, fn=..., exec_ctx=..., const_vars=std::vector of length 0, capacity 0, mutable_vars=std::vector of length 1, capacity 1 = {...}, prop=mxnet::FnProperty::kCPUPrioritized, priority=0, opr_name=0x7fa2b16372ec "BytePSPushPull", wait=false) at ../src/engine/threaded_engine.cc:343
   #7  0x00007fa4364a72f6 in MXEnginePushAsync (async_func=0x7fa2b15659f0 <byteps::mxnet::DoPushPull(void*, void*, void*)>, func_param=0x6ee84170, deleter=0x7fa2b1565040 <byteps::mxnet::(anonymous namespace)::DeletePushPullParam(void*)>, ctx_handle=0x7fa2b7ff8a40 <byteps::mxnet::(anonymous namespace)::MX_EXEC_CTX>, const_vars_handle=0x0, num_const_vars=0, mutable_vars_handle=0x7ffd3340e8a8, num_mutable_vars=1, prop_handle=0x7fa2b1637380 <byteps::mxnet::(anonymous namespace)::MX_FUNC_PROP>, priority=0, opr_name=0x7fa2b16372ec "BytePSPushPull", wait=false) at ../src/c_api/c_api.cc:2665
   #8  0x00007fa2b156579d in byteps::mxnet::byteps_mxnet_push_pull_async (tensor=0x6d41f620, name=<optimized out>, version=0, priority=0, is_average=<optimized out>) at byteps/mxnet/ops.cc:116
   #9  0x00007fa50630fdae in ffi_call_unix64 () from /usr/lib/x86_64-linux-gnu/libffi.so.6
   #10 0x00007fa50630f71f in ffi_call () from /usr/lib/x86_64-linux-gnu/libffi.so.6
   #11 0x00007fa5065235c4 in _ctypes_callproc () from /usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so
   #12 0x00007fa506523c33 in ?? () from /usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services