You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/03/04 11:16:49 UTC

[GitHub] [incubator-mxnet] mika-fischer opened a new issue #14317: std::threads spawned by mxnet must catch all exceptions, otherwise the whole application will terminate

mika-fischer opened a new issue #14317: std::threads spawned by mxnet must catch all exceptions, otherwise the whole application will terminate
URL: https://github.com/apache/incubator-mxnet/issues/14317
 
 
   ## Description
   If an exception is thrown and not caught in an `std::thread`, C++ will terminate the application. So in a proper library that must never happen. Therefore all threads spawned by mxnet must catch all exceptions and handle them somehow.
   
   Currently (with the latest release, 1.3.1) this is not the case, leading mxnet to tear down our whole program...
   
   For instance, in `threaded_engine_perdevice.cc`, the `GPUWorker` function is called from the worker threads without a `try` block and the third line in that function is a `CHECK` macro which may throw an exception. And in the further code path there are a lot more `CHECK`s.
   
   Here's an example stack trace:
   
   ```
   [2019-03-04T11:47:44.016+01:00] ERROR: terminate called after throwing an instance of 'dmlc::Error'
   [2019-03-04T11:47:44.016+01:00] ERROR: what():  [11:47:44] /src/mxnet-1.3.1/3rdparty/mshadow/mshadow/./stream_gpu-inl.h:139: Check failed: err == CUSOLVER_STATUS_SUCCESS (7 vs. 0) Create cusolver handle failed
   [2019-03-04T11:47:44.016+01:00] ERROR: 
   [2019-03-04T11:47:44.016+01:00] ERROR: Stack trace returned 9 entries:
   [2019-03-04T11:47:44.016+01:00] ERROR: [bt] (0) libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x16c) [0x7f677eb27fcc]
   [2019-03-04T11:47:44.016+01:00] ERROR: [bt] (1) libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x29) [0x7f677eb296f9]
   [2019-03-04T11:47:44.016+01:00] ERROR: [bt] (2) libmxnet.so(mshadow::Stream<mshadow::gpu>* mshadow::NewStream<mshadow::gpu>(bool, bool, int)+0x6db) [0x7f678077323b]
   [2019-03-04T11:47:44.016+01:00] ERROR: [bt] (3) libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>*, std::shared_ptr<dmlc::ManualEvent> const&)+0x10f) [0x7f678078952f]
   [2019-03-04T11:47:44.016+01:00] ERROR: [bt] (4) libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#4}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>&&)+0x3e) [0x7f678078972e]
   [2019-03-04T11:47:44.016+01:00] ERROR: [bt] (5) libmxnet.so(std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)>, std::shared_ptr<dmlc::ManualEvent> > > >::_M_run()+0x3a) [0x7f678078584a]
   [2019-03-04T11:47:44.016+01:00] ERROR: [bt] (6) libmxnet.so(+0x489793f) [0x7f67828c193f]
   [2019-03-04T11:47:44.016+01:00] ERROR: [bt] (7) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f674bb4f6db]
   [2019-03-04T11:47:44.016+01:00] ERROR: [bt] (8) /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f674aeb688f]
   [2019-03-04T11:47:44.016+01:00] ERROR: 
   [2019-03-04T11:47:44.016+01:00] ERROR: 
   [2019-03-04T11:47:44.239+01:00] ERROR: Child process native(6254) terminated with error (signal=SIGABRT)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services