You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/10/01 02:45:49 UTC

[GitHub] ShoufaChen opened a new issue #12708: SyncBatchNorm not supports 1D batch norm

ShoufaChen opened a new issue #12708: SyncBatchNorm not supports  1D batch norm
URL: https://github.com/apache/incubator-mxnet/issues/12708
 
 
   SyncBatchNorm not supports  1D batch norm?
   ## Environment info (Required)
   Ubuntu 18.04
   CUDA 9.0
   mxnet 1.3.0
   python 3.6
   
   
   I add a module based upon gluon-cv  yolo, and use a `conv1d` followed with a batch norm layer. The code is shown as:
   ```
                   if num_sync_bn_devices < 1:
                       self.W.add(nn.BatchNorm(beta_initializer='zeros', gamma_initializer='zeros'))
                   else:
                       self.W.add(gluon.contrib.nn.SyncBatchNorm(num_devices=num_sync_bn_devices,
                           beta_initializer='zeros', gamma_initializer='zeros'))
   ```
   Note that this batch norm layer follows a `Conv1D` layer.  When `num_sync_bn_devices < 1`, this will work properly, however, I will get following error when `num_sync_bn_devices >=1`:
   
   ## Error Message:
   (Paste the complete error message, including stack trace.)
   ```shell
     File "train_yolo3.py", line 280, in <module>
       train(net, train_data, val_data, eval_metric, ctx, args)
     File "train_yolo3.py", line 216, in train
       obj_metrics.update(0, obj_losses)
     File "/home/csf/anaconda3/envs/mxnet/lib/python3.6/site-packages/mxnet/metric.py", line 1289, in update
       self.sum_metric += ndarray.sum(pred).asscalar()
     File "/home/csf/anaconda3/envs/mxnet/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py", line 1990, in asscalar
       return self.asnumpy()[0]
     File "/home/csf/anaconda3/envs/mxnet/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py", line 1972, in asnumpy
       ctypes.c_size_t(data.size)))
     File "/home/csf/anaconda3/envs/mxnet/lib/python3.6/site-packages/mxnet/base.py", line 252, in check_call
       raise MXNetError(py_str(_LIB.MXGetLastError()))
   mxnet.base.MXNetError: [10:34:15] /home/ubuntu/mxnet-distro/mxnet-build/3rdparty/tvm/nnvm/include/nnvm/tuple.h:438: Check failed: dim == static_cast<int>(ndim()) (4 vs. 3) dimension do not match target dimension 4 vs 3
   
   Stack trace returned 10 entries:
   [bt] (0) /home/csf/anaconda3/envs/mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x36161a) [0x7f675be1861a]
   [bt] (1) /home/csf/anaconda3/envs/mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x361c31) [0x7f675be18c31]
   [bt] (2) /home/csf/anaconda3/envs/mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x3e46f2) [0x7f675be9b6f2]
   [bt] (3) /home/csf/anaconda3/envs/mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x319623f) [0x7f675ec4d23f]
   [bt] (4) /home/csf/anaconda3/envs/mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x332cb52) [0x7f675ede3b52]
   [bt] (5) /home/csf/anaconda3/envs/mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2cb77b4) [0x7f675e76e7b4]
   [bt] (6) /home/csf/anaconda3/envs/mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2aeb58a) [0x7f675e5a258a]
   [bt] (7) /home/csf/anaconda3/envs/mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2aebbe6) [0x7f675e5a2be6]
   [bt] (8) /home/csf/anaconda3/envs/mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2a4b8dd) [0x7f675e5028dd]
   [bt] (9) /home/csf/anaconda3/envs/mxnet/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2a4b8c7) [0x7f675e5028c7]
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services