You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/12/26 08:37:27 UTC

[GitHub] [incubator-mxnet] Justobe opened a new issue #19717: mxnet.base.MXNetError: MXNetError: Error in operator batchnorm6

Justobe opened a new issue #19717:
URL: https://github.com/apache/incubator-mxnet/issues/19717


   ## Description
   mxnet throws an exception when I try to build my model and use mxnet as the backend of keras. However, my script runs successfully on other backends of keras (such as tensorflow and cntk). I further found that the problem may be caused by batch normalization in the program when using mxnet.
   I also noticed that this issue was mentioned in #15721, but this issue still exists in the latest keras-mxnet 2.2.4.2 and mxnet-cu101 1.7
   
   
   ### Error Message
   > Traceback (most recent call last):
     File "crash_checker.py", line 67, in <module>
       model.add(Dense(10, activation='softmax'))
     File "/root/anaconda3/envs/mxnet_170/lib/python3.6/site-packages/keras/engine/sequential.py", line 181, in add
       output_tensor = layer(self.outputs[0])
     File "/root/anaconda3/envs/mxnet_170/lib/python3.6/site-packages/keras/engine/base_layer.py", line 470, in __call__
       output = self.call(inputs, **kwargs)
     File "/root/anaconda3/envs/mxnet_170/lib/python3.6/site-packages/keras/layers/core.py", line 893, in call
       output = K.bias_add(output, self.bias, data_format='channels_last')
     File "/root/anaconda3/envs/mxnet_170/lib/python3.6/site-packages/keras/backend/mxnet_backend.py", line 94, in func_wrapper
       train_symbol = func(*args, **kwargs)
     File "/root/anaconda3/envs/mxnet_170/lib/python3.6/site-packages/keras/backend/mxnet_backend.py", line 3982, in bias_add
       x_dim = ndim(x)
     File "/root/anaconda3/envs/mxnet_170/lib/python3.6/site-packages/keras/backend/mxnet_backend.py", line 535, in ndim
       shape = x.shape
     File "/root/anaconda3/envs/mxnet_170/lib/python3.6/site-packages/keras/backend/mxnet_backend.py", line 4395, in shape
       return self._get_shape()
     File "/root/anaconda3/envs/mxnet_170/lib/python3.6/site-packages/keras/backend/mxnet_backend.py", line 4404, in _get_shape
       _, out_shape, _ = self.symbol.infer_shape_partial()
     File "/root/anaconda3/envs/mxnet_170/lib/python3.6/site-packages/mxnet/symbol/symbol.py", line 1177, in infer_shape_partial
       return self._infer_shape_impl(True, *args, **kwargs)
     File "/root/anaconda3/envs/mxnet_170/lib/python3.6/site-packages/mxnet/symbol/symbol.py", line 1265, in _infer_shape_impl
       ctypes.byref(complete)))
     File "/root/anaconda3/envs/mxnet_170/lib/python3.6/site-packages/mxnet/base.py", line 246, in check_call
       raise get_last_ffi_error()
   **mxnet.base.MXNetError: MXNetError: Error in operator batchnorm6: [16:26:44] include/mxnet/./tuple.h:245: Check failed: i >= 0 && i < ndim(): index = -2 must be in range [0, -1)**
   
   
   
   ## To Reproduce
   I provide a simple script to reproduce the bug, run the following script such as:
   
   ```
   import os
   import sys
   bk = sys.argv[1]
   os.environ['KERAS_BACKEND'] = bk
   from keras import backend as K
   import keras
   
   
   from keras.models import Sequential
   from keras.layers.core import Dense
   from keras.layers import Conv2D,MaxPooling2D,BatchNormalization,Flatten,Dropout
   
   model = Sequential()
   
   model.add(Conv2D(96, (3,3), strides=(2,2), activation='relu', padding='same', input_shape=(32, 32, 3,)))
   model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
   # Local Response normalization for Original Alexnet
   model.add(BatchNormalization())
   
   model.add(Conv2D(96, (3,3), activation='relu', padding='same'))
   model.add(MaxPooling2D(pool_size=(3, 3), strides=(2,2)))
   model.add(BatchNormalization())
   
   model.add(Conv2D(192, (3,3), activation='relu', padding='same'))
   model.add(Conv2D(192, (3,3), activation='relu', padding='same'))
   model.add(Conv2D(256, (3,3), activation='relu', padding='same'))
   model.add(MaxPooling2D(pool_size=(3, 3), strides=(2,2)))
   model.add(BatchNormalization())
   
   model.add(Flatten())
   model.add(Dense(512, activation='tanh'))
   model.add(Dropout(0.5))
   model.add(Dense(256, activation='tanh'))
   
   # Comment out this line of code, mxnet runs successfully
   # However, this script runs successfully on both tensorflow and cntk
   model.add(BatchNormalization())
   
   model.add(Dropout(0.5))
   model.add(Dense(10, activation='softmax'))
   
   
   # print the model summary
   model.summary()
   
   ```
   
   ### Steps to reproduce
   `python myscript.py mxnet` (change mxnet to tensorflow if you want to test under backend tensorflow)
   
   
   ## Environment
   ```
   Package             Version
   ------------------- -------------------
   cached-property     1.5.2
   certifi             2020.12.5
   chardet             4.0.0
   cycler              0.10.0
   decorator           4.4.2
   graphviz            0.8.4
   h5py                2.10.0
   idna                2.10
   Keras-Applications  1.0.8
   keras-mxnet         2.2.4.2
   Keras-Preprocessing 1.1.2
   kiwisolver          1.3.1
   matplotlib          3.2.2
   mxnet-cu101         1.7.0
   networkx            2.5
   numpy               1.19.4
   pandas              0.23.0
   Pillow              5.1.0
   pip                 20.3.3
   pyparsing           2.4.7
   python-dateutil     2.8.1
   pytz                2020.5
   PyWavelets          1.1.1
   PyYAML              5.3.1
   redis               3.3.2
   requests            2.25.1
   scikit-image        0.13.1
   scikit-learn        0.19.1
   scipy               1.1.0
   setuptools          51.0.0.post20201207
   six                 1.15.0
   urllib3             1.26.2
   wheel               0.36.2
   
   
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] yangshuo0323 commented on issue #19717: mxnet.base.MXNetError: MXNetError: Error in operator batchnorm6

Posted by GitBox <gi...@apache.org>.
yangshuo0323 commented on issue #19717:
URL: https://github.com/apache/incubator-mxnet/issues/19717#issuecomment-770141771


   I see you have trained your model based on MXNet version 1.7.0.  I want to train BERT on mutiple GPU, and I have another doubt want to consult you. Do you meet this trouble:
   ```
   [1,4]<stderr>:===================
   [1,5]<stderr>:[node106:26502:0:26502] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x30)
   [1,5]<stderr>:==== backtrace ====
   [1,6]<stderr>:[node106:26503:0:26503] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x30)
   [1,6]<stderr>:==== backtrace ====
   [1,5]<stderr>:    0  /usr/lib/libucs.so.0(+0x1fcec) [0x7f40f065bcec]
   [1,5]<stderr>:    1  /usr/lib/libucs.so.0(+0x1ff64) [0x7f40f065bf64]
   [1,5]<stderr>:    2  /lib/x86_64-linux-gnu/libpthread.so.0(pthread_mutex_lock+0x4) [0x7f42ead77d44]
   [1,5]<stderr>:    3  /home/yangshuo/miniconda3/envs/yangshuo/lib/python3.7/site-packages/mxnet/libmxnet.so(_ZN5mxnet6engine11ThreadedVar21AppendWriteDependencyEPNS0_8OprBlockE+0x44) [0x7f428d022564]
   [1,5]<stderr>:    4  /home/yangshuo/miniconda3/envs/yangshuo/lib/python3.7/site-packages/mxnet/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine4PushEPNS0_3OprENS_7ContextEib+0x280) [0x7f428d025790]
   [1,5]<stderr>:    5  /home/yangshuo/miniconda3/envs/yangshuo/lib/python3.7/site-packages/mxnet/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine9PushAsyncESt8functionIFvNS_10RunContextENS0_18CallbackOnCompleteEEENS_7ContextERKSt6vectorIPNS0_3VarESaISA_EESE_NS_10FnPropertyEiPKcb+0x131) [0x7f428d01ded1]
   [1,5]<stderr>:    6  /home/yangshuo/miniconda3/envs/yangshuo/lib/python3.7/site-packages/mxnet/libmxnet.so(_ZN5mxnet10CopyFromToERKNS_7NDArrayES2_ib+0xaf4) [0x7f428cff89d4]
   [1,5]<stderr>:    7  /home/yangshuo/miniconda3/envs/yangshuo/lib/python3.7/site-packages/horovod/mxnet/mpi_lib.cpython-37m-x86_64-linux-gnu.so(_ZN7horovod5mxnet29PushHorovodOperationCudaOnCPUENS_6common7Request11RequestTypeEPN5mxnet7NDArrayES6_PKcii+0xe6f) [0x7f410243a18f]
   [1,5]<stderr>:    8  /home/yangshuo/miniconda3/envs/yangshuo/lib/python3.7/site-packages/horovod/mxnet/mpi_lib.cpython-37m-x86_64-linux-gnu.so(horovod_mxnet_broadcast_async+0x54) [0x7f4102431d84]
   [1,5]<stderr>:    9  /home/yangshuo/miniconda3/envs/yangshuo/lib/python3.7/lib-dynload/../../libffi.so.7(+0x69dd) [0x7f42e9da49dd]
   [1,5]<stderr>:   10  /home/yangshuo/miniconda3/envs/yangshuo/lib/python3.7/lib-dynload/../../libffi.so.7(+0x6067) [0x7f42e9da4067]
   [1,5]<stderr>:   11  /home/yangshuo/miniconda3/envs/yangshuo/lib/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce) [0x7f42eafd527e]
   [1,5]<stderr>:   12  /home/yangshuo/miniconda3/envs/yangshuo/lib/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so(+0x12cb4) [0x7f42eafd5cb4]
   [1,5]<stderr>:   13  python(_PyObject_FastCallKeywords+0x48b) [0x564d0453c00b]
   [1,5]<stderr>:   14  python(_PyEval_EvalFrameDefault+0x51d1) [0x564d045a09a1]
   [1,5]<stderr>:   15  python(_PyEval_EvalCodeWithName+0x2f9) [0x564d044e42b9]
   [1,5]<stderr>:   16  python(_PyFunction_FastCallKeywords+0x387) [0x564d04534497]
   [1,5]<stderr>:   17  python(_PyEval_EvalFrameDefault+0x14ea) [0x564d0459ccba]
   [1,5]<stderr>:   18  python(_PyEval_EvalCodeWithName+0x2f9) [0x564d044e42b9]
   [1,5]<stderr>:   19  python(_PyFunction_FastCallKeywords+0x387) [0x564d04534497]
   [1,5]<stderr>:   20  python(_PyEval_EvalFrameDefault+0x14ea) [0x564d0459ccba]
   [1,5]<stderr>:   21  python(_PyFunction_FastCallKeywords+0xfb) [0x564d0453420b]
   [1,5]<stderr>:   22  python(_PyEval_EvalFrameDefault+0x416) [0x564d0459bbe6]
   [1,5]<stderr>:   23  python(_PyEval_EvalCodeWithName+0x2f9) [0x564d044e42b9]
   [1,5]<stderr>:   24  python(PyEval_EvalCodeEx+0x44) [0x564d044e51d4]
   [1,5]<stderr>:   25  python(PyEval_EvalCode+0x1c) [0x564d044e51fc]
   [1,5]<stderr>:   26  python(+0x22bf44) [0x564d045faf44]
   [1,5]<stderr>:   27  python(PyRun_FileExFlags+0xa1) [0x564d046052b1]
   [1,5]<stderr>:   28  python(PyRun_SimpleFileExFlags+0x1c3) [0x564d046054a3]
   [1,5]<stderr>:   29  python(+0x2375d5) [0x564d046065d5]
   [1,5]<stderr>:   30  python(_Py_UnixMain+0x3c) [0x564d046066fc]
   [1,5]<stderr>:   31  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f42ea9c4840]
   [1,5]<stderr>:   32  python(+0x1dc3c0) [0x564d045ab3c0]
   [1,5]<stderr>:===================
   [1,6]<stderr>:    0  /usr/lib/libucs.so.0(+0x1fcec) [0x7f1a6c25bcec]
   [1,6]<stderr>:    1  /usr/lib/libucs.so.0(+0x1ff64) [0x7f1a6c25bf64]
   [1,6]<stderr>:    2  /lib/x86_64-linux-gnu/libpthread.so.0(pthread_mutex_lock+0x4) [0x7f1c66a2ad44]
   [1,6]<stderr>:    3  /home/yangshuo/miniconda3/envs/yangshuo/lib/python3.7/site-packages/mxnet/libmxnet.so(_ZN5mxnet6engine11ThreadedVar21AppendWriteDependencyEPNS0_8OprBlockE+0x44) [0x7f1c08cd5564]
   [1,6]<stderr>:    4  /home/yangshuo/miniconda3/envs/yangshuo/lib/python3.7/site-packages/mxnet/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine4PushEPNS0_3OprENS_7ContextEib+0x280) [0x7f1c08cd8790]
   [1,6]<stderr>:    5  /home/yangshuo/miniconda3/envs/yangshuo/lib/python3.7/site-packages/mxnet/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine9PushAsyncESt8functionIFvNS_10RunContextENS0_18CallbackOnCompleteEEENS_7ContextERKSt6vectorIPNS0_3VarESaISA_EESE_NS_10FnPropertyEiPKcb+0x131) [0x7f1c08cd0ed1]
   [1,6]<stderr>:    6  /home/yangshuo/miniconda3/envs/yangshuo/lib/python3.7/site-packages/mxnet/libmxnet.so(_ZN5mxnet10CopyFromToERKNS_7NDArrayES2_ib+0xaf4) [0x7f1c08cab9d4]
   [1,6]<stderr>:    7  /home/yangshuo/miniconda3/envs/yangshuo/lib/python3.7/site-packages/horovod/mxnet/mpi_lib.cpython-37m-x86_64-linux-gnu.so(_ZN7horovod5mxnet29PushHorovodOperationCudaOnCPUENS_6common7Request11RequestTypeEPN5mxnet7NDArrayES6_PKcii+0xe6f) [0x7f1a7e0e118f]
   [1,6]<stderr>:    8  /home/yangshuo/miniconda3/envs/yangshuo/lib/python3.7/site-packages/horovod/mxnet/mpi_lib.cpython-37m-x86_64-linux-gnu.so(horovod_mxnet_broadcast_async+0x54) [0x7f1a7e0d8d84]
   [1,6]<stderr>:    9  /home/yangshuo/miniconda3/envs/yangshuo/lib/python3.7/lib-dynload/../../libffi.so.7(+0x69dd) [0x7f1c65a579dd]
   [1,6]<stderr>:   10  /home/yangshuo/miniconda3/envs/yangshuo/lib/python3.7/lib-dynload/../../libffi.so.7(+0x6067) [0x7f1c65a57067]
   [1,6]<stderr>:   11  /home/yangshuo/miniconda3/envs/yangshuo/lib/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce) [0x7f1c66c8827e]
   [1,6]<stderr>:   12  /home/yangshuo/miniconda3/envs/yangshuo/lib/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so(+0x12cb4) [0x7f1c66c88cb4]
   [1,6]<stderr>:   13  python(_PyObject_FastCallKeywords+0x48b) [0x562df52e800b]
   [1,6]<stderr>:   14  python(_PyEval_EvalFrameDefault+0x51d1) [0x562df534c9a1]
   [1,6]<stderr>:   15  python(_PyEval_EvalCodeWithName+0x2f9) [0x562df52902b9]
   [1,6]<stderr>:   16  python(_PyFunction_FastCallKeywords+0x387) [0x562df52e0497]
   [1,6]<stderr>:   17  python(_PyEval_EvalFrameDefault+0x14ea) [0x562df5348cba]
   [1,6]<stderr>:   18  python(_PyEval_EvalCodeWithName+0x2f9) [0x562df52902b9]
   [1,6]<stderr>:   19  python(_PyFunction_FastCallKeywords+0x387) [0x562df52e0497]
   [1,6]<stderr>:   20  python(_PyEval_EvalFrameDefault+0x14ea) [0x562df5348cba]
   [1,6]<stderr>:   21  python(_PyFunction_FastCallKeywords+0xfb) [0x562df52e020b]
   [1,6]<stderr>:   22  python(_PyEval_EvalFrameDefault+0x416) [0x562df5347be6]
   [1,6]<stderr>:   23  python(_PyEval_EvalCodeWithName+0x2f9) [0x562df52902b9]
   [1,6]<stderr>:   24  python(PyEval_EvalCodeEx+0x44) [0x562df52911d4]
   [1,6]<stderr>:   25  python(PyEval_EvalCode+0x1c) [0x562df52911fc]
   [1,6]<stderr>:   26  python(+0x22bf44) [0x562df53a6f44]
   [1,6]<stderr>:   27  python(PyRun_FileExFlags+0xa1) [0x562df53b12b1]
   [1,6]<stderr>:   28  python(PyRun_SimpleFileExFlags+0x1c3) [0x562df53b14a3]
   [1,6]<stderr>:   29  python(+0x2375d5) [0x562df53b25d5]
   [1,6]<stderr>:   30  python(_Py_UnixMain+0x3c) [0x562df53b26fc]
   [1,6]<stderr>:   31  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f1c66677840]
   [1,6]<stderr>:   32  python(+0x1dc3c0) [0x562df53573c0]
   [1,6]<stderr>:===================
   --------------------------------------------------------------------------
   Primary job  terminated normally, but 1 process returned
   a non-zero exit code. Per user-direction, the job has been aborted.
   --------------------------------------------------------------------------
   --------------------------------------------------------------------------
   mpirun noticed that process rank 7 with PID 0 on node node106 exited on signal 11 (Segmentation fault).
   ```
   - My environment is:
   ```
   gluonnlp               0.10.0
   horovod                0.19.5
   mxnet-cu102            1.7.0
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] yangshuo0323 commented on issue #19717: mxnet.base.MXNetError: MXNetError: Error in operator batchnorm6

Posted by GitBox <gi...@apache.org>.
yangshuo0323 commented on issue #19717:
URL: https://github.com/apache/incubator-mxnet/issues/19717#issuecomment-770141916


   @Justobe 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] szha commented on issue #19717: mxnet.base.MXNetError: MXNetError: Error in operator batchnorm6

Posted by GitBox <gi...@apache.org>.
szha commented on issue #19717:
URL: https://github.com/apache/incubator-mxnet/issues/19717#issuecomment-754320285


   cc @sandeep-krishnamurthy 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] Justobe commented on issue #19717: mxnet.base.MXNetError: MXNetError: Error in operator batchnorm6

Posted by GitBox <gi...@apache.org>.
Justobe commented on issue #19717:
URL: https://github.com/apache/incubator-mxnet/issues/19717#issuecomment-770142589


   @yangshuo0323 Sorry, I did not meet similar trouble like that. The exception of my script was thrown when I used mxnet as the backend of Keras.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org