You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mxnet.apache.org by "Pedro Larroy (JIRA)" <ji...@apache.org> on 2018/05/03 13:59:00 UTC

[jira] [Commented] (MXNET-396) MKLDNN: Non deterministic segfault on test_module.py:test_forward_reshape

    [ https://issues.apache.org/jira/browse/MXNET-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16462479#comment-16462479 ] 

Pedro Larroy commented on MXNET-396:
------------------------------------

Got a stack trace:

 

[WARNING] *** test-level seed set: all "@with_seed()" tests run deterministically ***
test_module.test_forward_reshape ... [INFO] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=11 to reproduce.
[13:54:40] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 81920 bytes with malloc directly
[13:54:40] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 576000 bytes with malloc directly
/work/mxnet/python/mxnet/module/base_module.py:66: UserWarning: Data provided by label_shapes don't match names specified by label_names ([] vs. ['softmax_label'])
 warnings.warn(msg)

Segmentation fault: 11

Stack trace returned 10 entries:
[bt] (0) /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5a) [0x7f7fed68e8fa]
[bt] (1) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x309619f) [0x7f7ff029b19f]
[bt] (2) /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7f801aa774b0]
[bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::NDArray::GetMKLDNNData() const+0x637) [0x7f7fefde2a57]
[bt] (4) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::NDArray::GetMKLDNNDataReorder(mkldnn::memory::primitive_desc const&) const+0x33c) [0x7f7fefde512c]
[bt] (5) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::op::MKLDNNConvolutionForward(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x26e0) [0x7f7fed68b150]
[bt] (6) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x28da1ce) [0x7f7fefadf1ce]
[bt] (7) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x29eaed7) [0x7f7fefbefed7]
[bt] (8) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x29eafc1) [0x7f7fefbeffc1]
[bt] (9) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0xcb5) [0x7f7ff01b1f65]
ok

> MKLDNN: Non deterministic segfault on test_module.py:test_forward_reshape 
> --------------------------------------------------------------------------
>
>                 Key: MXNET-396
>                 URL: https://issues.apache.org/jira/browse/MXNET-396
>             Project: Apache MXNet
>          Issue Type: Improvement
>            Reporter: Pedro Larroy
>            Priority: Major
>
> There's random crashes in the given test.
>  
> Even fixing the seeds don't trigger a reproduction:
>  
> + export MXNET_TEST_SEED=11
>  + export MXNET_MODULE_SEED=812478194
>  
> [http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/805/pipeline]
>  Was able to reproduce once in a fresh p3 instance with DLAMI
> Another failure with: 
> MXNET_MODULE_SEED=729680784
>  Different machine



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org