You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2017/12/28 07:54:56 UTC

[GitHub] wkcn opened a new issue #9216: Loss of Precision in BatchNorm and output_var may be wrong

wkcn opened a new issue #9216: Loss of Precision in BatchNorm and output_var may be wrong
URL: https://github.com/apache/incubator-mxnet/issues/9216
 
 
   ## Description
   1. BatchNorm loses a little precision
   2. the output_var in BatchNorm may be wrong
   
   ## Environment Info
   OS: Arch Linux 4.14.8
   MXNet: 1.0.0 and 1.0.1 (the latest version, CPU version)
   
   ## Build Config
   make -j8 USE_OPENCV=1 USE_BLAS=openblas
   ***
   
   Hi, there.
   
   I converted [ResNet Model on Caffe](https://github.com/KaimingHe/deep-residual-networks) to [ResNet model on MXNet](https://github.com/wkcn/resnet-v1-mx).
   
   And I found that the output results between Caffe and MXNet are different.
   
   The reason is that the computations of Caffe and MXNet are different.
   
   For the BatchNorm in Caffe, the output is `(x - mean(x)) / sqrt(var(x) + eps)`.
   
   For the BatchNorm in MXNet, the output is `(x - mean(x)) * factor`, and `factor = 1.0 / sqrt(var(x) + eps)`.
   
   I think the method in MXNet will **lose a little precision** but bring the **higher performance** (Reduce the times of division).
   
   At the same time, I found that the `output_var` in BatchNorm may be wrong. 
   
   The `output_var` is **invstd**, namely the multiplicative inverse of the standard deviation. I think it should be the variance.
   
   ## Steps to reproduce
   
   Here is [my testing code](https://github.com/wkcn/test_mxnet_bn).
   
   I compare three outputs:
   
   - numpy (compute manally)
   - caffe
   - mxnet
   
   ```
   caffe and numpy 0.0 0.0
   caffe and mx 16.0 2.36527e-07
   numpy and mx 16.0 2.36527e-07
   ```
   
   The first column is the `maxmimum absolute error`, and the second column is the `maxmimum relative error`.
   
   # What I have tried to solve it
   
   I change the BatchNorm implement in MXNet, and the output is below:
   
   ```
   caffe and numpy 0.0 0.0
   caffe and mx 0.0 0.0
   numpy and mx 0.0 0.0
   ```
   
   The modified BatchNorm(cpu) code is [here](https://github.com/wkcn/incubator-mxnet/commit/5ecd4882bc043cf059e962f7ce488270bafa07c7).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services