You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mxnet.apache.org by GitBox <gi...@apache.org> on 2021/07/09 14:53:33 UTC
[GitHub] [incubator-mxnet] maybeLee commented on issue #20416: MXNetError: Check failed: assign(&dattr, vec.at(i)): Incompatible attr in node at 0-th output: expected [256], got [1]
maybeLee commented on issue #20416:
URL: https://github.com/apache/incubator-mxnet/issues/20416#issuecomment-877246558
Hi, to make this bug easier to reproduce and understand. I simplify the triggering model into a very easy three-layer randomly generated model.
The model includes three layers: 1 softmax, 1 max pooling, and 1 batch normalization. I create such models using keras and randomly generate weights for batch normalization layer.
You can reproduce the bug by running following code with mxnet version 1.8.0, _you don't need to use all the other trained models:_
```
import os
import argparse
import sys
import warnings
parse = argparse.ArgumentParser()
parse.add_argument("--bk", type=str,default="mxnet", help="the name of backend")
flags, _ = parse.parse_known_args(sys.argv[1:])
os.environ["KERAS_BACKEND"]=flags.bk
import keras
from keras import initializers, layers
import numpy as np
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=UserWarning)
model_1 = keras.models.Sequential()
model_1.add(layers.Softmax())
model_1.add(layers.MaxPooling2D())
model_1.add(layers.BatchNormalization())
x = np.random.rand(1,3,3,256)
pred = model_1.predict(x)
print(pred)
```
By running the following command (I assume you save the above toy program as `try.py`:
- `python try --bk mxnet`
You will meet a crash with the same symptom as I mentioned before:
```
Traceback (most recent call last):
File "try.py", line 23, in <module>
pred = model_1.predict(x)
File "/root/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/engine/training.py", line 1184, in predict
steps=steps)
File "/root/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 295, in predict_loop
batch_outs = f(ins_batch)
File "/root/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/backend/mxnet_backend.py", line 5645, in predict_function
data, label, _, data_shapes, label_shapes = self._adjust_module(inputs, 'pred')
File "/root/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/backend/mxnet_backend.py", line 5525, in _adjust_module
self._set_weights()
File "/root/anaconda3/envs/diffcu_mxnet/lib/python3.6/site-packages/keras/backend/mxnet_backend.py", line 5573, in _set_weights
allow_missing=True)
File "/mxnet/incubator-mxnet/python/mxnet/module/bucketing_module.py", line 220, in set_params
force_init=force_init, allow_extra=allow_extra)
File "/mxnet/incubator-mxnet/python/mxnet/module/module.py", line 358, in set_params
self._exec_group.set_params(arg_params, aux_params, allow_extra=allow_extra)
File "/mxnet/incubator-mxnet/python/mxnet/module/executor_group.py", line 422, in set_params
exec_.copy_params_from(arg_params, aux_params, allow_extra_params=allow_extra)
File "/mxnet/incubator-mxnet/python/mxnet/executor.py", line 367, in copy_params_from
array.astype(dst.dtype).copyto(dst)
File "/mxnet/incubator-mxnet/python/mxnet/ndarray/ndarray.py", line 2663, in copyto
return _internal._copyto(self, out=other)
File "<string>", line 27, in _copyto
File "/mxnet/incubator-mxnet/python/mxnet/_ctypes/ndarray.py", line 91, in _imperative_invoke
ctypes.byref(out_stypes)))
File "/mxnet/incubator-mxnet/python/mxnet/base.py", line 246, in check_call
raise get_last_ffi_error()
mxnet.base.MXNetError: Traceback (most recent call last):
File "src/operator/numpy/linalg/./../../tensor/../elemwise_op_common.h", line 135
MXNetError: Check failed: assign(&dattr, vec.at(i)): Incompatible attr in node at 0-th output: expected [256], got [1]
```
But if you run such program using CNTK as the backend (`python try.py --bk cntk`), everything works fine.
One interesting thing is that: If I delete either `softmax layer`, `batch normalization` or `max pooling layer`, no crash will happen.
Further, I tried some investigations and guess this is a bug caused by the wrong shape inference of mxnet.
When I change the shape of input to `x=np.random.rand(1, 3, 3, 1)` or `x = np.random.rand(1, 8, 8, 4)` or `x=np.random.rand(1, 5, 3, 1)` , everything works fine and mxnet will not crash.
**But if I set the shape of input to `x=np.random.rand(1,3,3,10)`, which the `-1 th` dimension does not match the `-2 th` dimension after max pooling, mxnet will crash and report such check failed issue.**
Therefore, I assume in some code logic inner elementwise_op_common.h file, it assumes the `-1 th` dimension should be consistent with `-2 th` dimension.
Can you help check whether this is a true problem? And what's is the root cause of such issue?
Indeed thanks for your help.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org