You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/08/28 20:16:43 UTC
[GitHub] ankkhedia edited a comment on issue #12310: Flaky test:
test_ndarray.test_order
ankkhedia edited a comment on issue #12310: Flaky test: test_ndarray.test_order
URL: https://github.com/apache/incubator-mxnet/issues/12310#issuecomment-416725396
It seems like there is a problem while using different dtype ndarrays on GPU with top operator.
It can be reproduced by running small code snippet with latest mxnet build
```python
import mxnet as mx
import numpy as np
dat_size=5
dtype=np.int64
a_npy= np.arange(dat_size ** 4, dtype=dtype).reshape((dat_size, dat_size, dat_size, dat_size))
a_nd = mx.nd.array(a_npy, ctx=mx.gpu(0), dtype=dtype)
nd_ret_topk = mx.nd.topk(a_nd, axis=1, ret_typ="mask", k=2, is_ascend=False).asnumpy()
```
The above code snippet led to this failure:
```python
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/incubator-mxnet/python/mxnet/ndarray/ndarray.py", line 1976, in asnumpy
ctypes.c_size_t(data.size)))
File "/home/ubuntu/incubator-mxnet/python/mxnet/base.py", line 252, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [18:53:51] /home/ubuntu/incubator-mxnet/3rdparty/mshadow/mshadow/./stream_gpu-inl.h:62: Check failed: e == cudaSuccess CUDA: misaligned address
Stack trace returned 10 entries:
[bt] (0) /home/ubuntu/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b) [0x7f92bac6005b]
[bt] (1) /home/ubuntu/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f92bac60bc8]
[bt] (2) /home/ubuntu/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(mshadow::Stream<mshadow::gpu>::Wait()+0xd8) [0x7f92bd4a6ee8]
[bt] (3) /home/ubuntu/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::imperative::PushFCompute(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}::operator()(mxnet::RunContext) const+0x3dc) [0x7f92bd64561c]
[bt] (4) /home/ubuntu/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(+0x3c2cacb) [0x7f92bdb96acb]
[bt] (5) /home/ubuntu/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x8e5) [0x7f92bdb90f25]
[bt] (6) /home/ubuntu/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>*, std::shared_ptr<dmlc::ManualEvent> const&)+0xeb) [0x7f92bdba789b]
[bt] (7) /home/ubuntu/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#4}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>&&)+0x4e) [0x7f92bdba7b0e]
[bt] (8) /home/ubuntu/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run()+0x4a) [0x7f92bdb9052a]
[bt] (9) /home/ubuntu/anaconda3/bin/../lib/libstdc++.so.6(+0xafc5c) [0x7f9289c86c5c]
```
The support was added recently as a part of this PR:
https://github.com/apache/incubator-mxnet/pull/12250
@sxjscience
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services