You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/07/23 01:07:07 UTC

[GitHub] [incubator-mxnet] eric-haibin-lin commented on issue #18772: horovod seg-fault with mxnet pip wheels

eric-haibin-lin commented on issue #18772:
URL: https://github.com/apache/incubator-mxnet/issues/18772#issuecomment-662772210


   ```
   [1,0]<stdout>:(gdb) bt
   [1,0]<stdout>:#0  0x00007ffff7419b80 in pthread_mutex_lock () from /lib64/libpthread.so.0
   [1,0]<stdout>:#1  0x00007fff68a1b81d in mxnet::engine::ThreadedVar::AppendWriteDependency(mxnet::engine::OprBlock*) ()
   [1,0]<stdout>:   from /home/ec2-user/.local/lib/python3.7/site-packages/mxnet/libmxnet.so
   [1,0]<stdout>:#2  0x00007fff68a176ff in mxnet::engine::ThreadedEngine::Push(mxnet::engine::Opr*, mxnet::Context, int, bool) ()
   [1,0]<stdout>:   from /home/ec2-user/.local/lib/python3.7/site-packages/mxnet/libmxnet.so
   [1,0]<stdout>:#3  0x00007fff68a147a7 in mxnet::engine::ThreadedEngine::PushAsync(std::function<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*, bool) ()
   [1,0]<stdout>:   from /home/ec2-user/.local/lib/python3.7/site-packages/mxnet/libmxnet.so
   [1,0]<stdout>:#4  0x00007fff688f5f42 in MXEnginePushAsync ()
   [1,0]<stdout>:   from /home/ec2-user/.local/lib/python3.7/site-packages/mxnet/libmxnet.so
   [1,0]<stdout>:#5  0x00007ffdcc11ace9 in horovod::mxnet::PushHorovodOperation (
   [1,0]<stdout>:    op_type=op_type@entry=horovod::common::Request::BROADCAST,
   [1,0]<stdout>:    input=input@entry=0x182fb90, output=output@entry=0x182fb90,
   [1,0]<stdout>:    name=name@entry=0x7ffdd5e63f20 "0.bias", priority=priority@entry=0,
   [1,0]<stdout>:    root_rank=root_rank@entry=0) at horovod/mxnet/mpi_ops.cc:138
   [1,0]<stdout>:#6  0x00007ffdcc116010 in horovod::mxnet::horovod_mxnet_broadcast_async (
   [1,0]<stdout>:    input=0x182fb90, output=0x182fb90, name=0x7ffdd5e63f20 "0.bias",
   [1,0]<stdout>:    root_rank=0, priority=0) at horovod/mxnet/mpi_ops.cc:301
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org