You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/05/04 09:55:17 UTC
[GitHub] dwSun opened a new issue #10809: Check failed: format !=
mkl_mem_->GetFormat() (5 vs. 5)
dwSun opened a new issue #10809: Check failed: format != mkl_mem_->GetFormat() (5 vs. 5)
URL: https://github.com/apache/incubator-mxnet/issues/10809
## Description
Crashed when training a model.
With code from [this tutorial](http://mxnet.incubator.apache.org/tutorials/gluon/datasets.html), I try to train my own model with MobileNetV2. But it crashed with mxnet-mkl-1.2.0b20180503 from pypi.
On mxnet-mkl-1.1.0 from pypi, this code works.
Batch size 32 and 16 can reproduce this error, others like 8 or 32 seems can't. Smaller network can't reproduce this error.
Not sure this error related to pr #10317 or not.
And maybe this is a same error like issue #10807.
## Environment info (Required)
This is the code
[crash.zip](https://github.com/apache/incubator-mxnet/files/1973878/crash.zip)
Run with
```py
python3 fashion.py
```
Package used (Python/R/Scala/Julia):
```
% pip3 list
Package Version
--------------- --------------
certifi 2018.4.16
chardet 3.0.4
graphviz 0.8.3
idna 2.6
mxnet-mkl 1.2.0b20180503
numpy 1.14.3
pandas 0.22.0
pip 10.0.1
pkg-resources 0.0.0
python-dateutil 2.7.2
pytz 2018.4
requests 2.18.4
setuptools 39.1.0
six 1.11.0
urllib3 1.22
wheel 0.31.0
```
## Error Message:
```
% python3 fashion.py
[17:28:49] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 57344 bytes with malloc directly
[17:28:49] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 4096 bytes with malloc directly
[17:28:49] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 172032 bytes with malloc directly
[17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 57344 bytes with malloc directly
[17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 4096 bytes with malloc directly
[17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 172032 bytes with malloc directly
Epoch 0, training loss: 2.55, validation loss: 2.31
[17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 57344 bytes with malloc directly
[17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 172032 bytes with malloc directly
[17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 1638400 bytes with malloc directly
[17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 1638400 bytes with malloc directly
[17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 57344 bytes with malloc directly
[17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 4096 bytes with malloc directly
[17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 172032 bytes with malloc directly
Epoch 1, training loss: 2.56, validation loss: 2.35
[17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 57344 bytes with malloc directly
[17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 172032 bytes with malloc directly
[17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 1638400 bytes with malloc directly
[17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 1638400 bytes with malloc directly
[17:28:51] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 57344 bytes with malloc directly
[17:28:51] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 4096 bytes with malloc directly
[17:28:51] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 172032 bytes with malloc directly
Traceback (most recent call last):
File "fashion.py", line 71, in <module>
valid_loss = cumulative_valid_loss.asscalar()/valid_samples
File "/home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py", line 1894, in asscalar
return self.asnumpy()[0]
File "/home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py", line 1876, in asnumpy
ctypes.c_size_t(data.size)))
File "/home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/base.py", line 149, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [17:28:51] src/ndarray/ndarray.cc:351: Check failed: format != mkl_mem_->GetFormat() (5 vs. 5)
Stack trace returned 10 entries:
[bt] (0) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x17009d) [0x7fba25e2f09d]
[bt] (1) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x170468) [0x7fba25e2f468]
[bt] (2) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2a4a1b8) [0x7fba287091b8]
[bt] (3) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2a4a29e) [0x7fba2870929e]
[bt] (4) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2899644) [0x7fba28558644]
[bt] (5) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x289d151) [0x7fba2855c151]
[bt] (6) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2899d0b) [0x7fba28558d0b]
[bt] (7) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbbc90) [0x7fba1ba04c90]
[bt] (8) /lib/x86_64-linux-gnu/libpthread.so.0(+0x75aa) [0x7fba37df35aa]
[bt] (9) /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7fba36f3ecbf]
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services