You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/08/28 20:16:43 UTC

[GitHub] ankkhedia edited a comment on issue #12310: Flaky test: test_ndarray.test_order

ankkhedia edited a comment on issue #12310: Flaky test: test_ndarray.test_order
URL: https://github.com/apache/incubator-mxnet/issues/12310#issuecomment-416725396
 
 
   It seems like there is a problem while using different dtype ndarrays on GPU with top operator.
   It can be reproduced by running small code snippet with latest mxnet build
   
   ```python
   import mxnet as mx
   import numpy as np
   dat_size=5
   dtype=np.int64
   a_npy= np.arange(dat_size ** 4, dtype=dtype).reshape((dat_size, dat_size, dat_size, dat_size))
   a_nd = mx.nd.array(a_npy, ctx=mx.gpu(0), dtype=dtype)
   nd_ret_topk = mx.nd.topk(a_nd, axis=1, ret_typ="mask", k=2, is_ascend=False).asnumpy()
   ```
   The above code snippet led to this failure:
   ```python
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/home/ubuntu/incubator-mxnet/python/mxnet/ndarray/ndarray.py", line 1976, in asnumpy
       ctypes.c_size_t(data.size)))
     File "/home/ubuntu/incubator-mxnet/python/mxnet/base.py", line 252, in check_call
       raise MXNetError(py_str(_LIB.MXGetLastError()))
   mxnet.base.MXNetError: [18:53:51] /home/ubuntu/incubator-mxnet/3rdparty/mshadow/mshadow/./stream_gpu-inl.h:62: Check failed: e == cudaSuccess CUDA: misaligned address
   
   Stack trace returned 10 entries:
   [bt] (0) /home/ubuntu/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b) [0x7f92bac6005b]
   [bt] (1) /home/ubuntu/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f92bac60bc8]
   [bt] (2) /home/ubuntu/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(mshadow::Stream<mshadow::gpu>::Wait()+0xd8) [0x7f92bd4a6ee8]
   [bt] (3) /home/ubuntu/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::imperative::PushFCompute(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}::operator()(mxnet::RunContext) const+0x3dc) [0x7f92bd64561c]
   [bt] (4) /home/ubuntu/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(+0x3c2cacb) [0x7f92bdb96acb]
   [bt] (5) /home/ubuntu/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x8e5) [0x7f92bdb90f25]
   [bt] (6) /home/ubuntu/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>*, std::shared_ptr<dmlc::ManualEvent> const&)+0xeb) [0x7f92bdba789b]
   [bt] (7) /home/ubuntu/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#4}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>&&)+0x4e) [0x7f92bdba7b0e]
   [bt] (8) /home/ubuntu/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run()+0x4a) [0x7f92bdb9052a]
   [bt] (9) /home/ubuntu/anaconda3/bin/../lib/libstdc++.so.6(+0xafc5c) [0x7f9289c86c5c]
   ```
   
   The support was added recently as a part of this PR:
   https://github.com/apache/incubator-mxnet/pull/12250
   
    
   @sxjscience 
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services