You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2017/12/05 22:36:36 UTC

[GitHub] CorneliusHagmeister opened a new issue #8959: Loss becomes nan when using correlation loss with MakeLoss on 2 images.

CorneliusHagmeister opened a new issue #8959: Loss becomes nan when using correlation loss with MakeLoss on 2 images.
URL: https://github.com/apache/incubator-mxnet/issues/8959
 
 
   I am trying to use an image similarity as loss function for my network. For some reason though, the loss I get is always nan. I already reduced my network to a minimal example, yet I still get the error. 
   
   ## Environment info
   python                    3.4.5
   mxnet                     0.12.1 
   ```
   
   ## Error Message:
   
   INFO:root:Epoch[0] Train-loss=nan
   INFO:root:Epoch[0] Time cost=0.759
   INFO:root:Epoch[1] Train-loss=nan
   INFO:root:Epoch[1] Time cost=0.785
   INFO:root:Epoch[2] Train-loss=nan
   INFO:root:Epoch[2] Time cost=0.798
   INFO:root:Epoch[3] Train-loss=nan
   INFO:root:Epoch[3] Time cost=0.763
   INFO:root:Epoch[4] Train-loss=nan
   INFO:root:Epoch[4] Time cost=0.773
   INFO:root:Epoch[5] Train-loss=nan
   INFO:root:Epoch[5] Time cost=0.887
   INFO:root:Epoch[6] Train-loss=nan
   INFO:root:Epoch[6] Time cost=0.917
   INFO:root:Epoch[7] Train-loss=nan
   INFO:root:Epoch[7] Time cost=0.801
   INFO:root:Epoch[8] Train-loss=nan
   INFO:root:Epoch[8] Time cost=0.860
   INFO:root:Epoch[9] Train-loss=nan
   INFO:root:Epoch[9] Time cost=1.064
   
   ```
   
   ## Minimum reproducible example
   
   ``` python
   
   def conv_net_regressor(image_shape, bn_mom=0.9):
       (nchannel, height, width) = image_shape
       # We have 2 data sources and concatenate them
       data_fixed = mx.sym.Variable(name='data_fixed')
       data_moving = mx.sym.Variable(name='data_moving')
       concat_data = mx.sym.concat(data_fixed,data_moving,dim=1)
       batched = mx.sym.BatchNorm(data=concat_data, fix_gamma=True, eps=2e-5, momentum=bn_mom, name='bn_data')
   
       body = mx.sym.Convolution(data=concat_data, num_filter=20, kernel=(3, 3), stride=(1, 1), pad=(0, 0),
                                 no_bias=True, name="conv" + str(0))
   
       body = mx.sym.Activation(data=body, act_type='tanh', name='relu' + str(0))
   
       fc2 = mx.sym.FullyConnected(data=body, num_hidden=10)
   
       #fc2= mx.sym.BlockGrad(fc2)
       cor = mx.sym.Correlation(data1=data_fixed, data2=data_moving) 
       stnet = mx.sym.MakeLoss(cor, normalization='batch')
       return stnet
   
   
   def get_symbol(image_shape):
       return conv_net_regressor(image_shape)
   
   
   if __name__ == '__main__':
       mnist_shape = (1, 28, 28)
       iterators = get_mnist_data_iterator()
       net = get_symbol(mnist_shape)
       model = mx.mod.Module(symbol=net, context=mx.cpu(),
                             label_names=None, data_names=['data_fixed', 'data_moving'])
       # a = mx.viz.plot_network(net)
       # a.render()
   
       model.fit(iterators[0],  
                       optimizer='sgd',
                       optimizer_params={'learning_rate': 0.1},
                       eval_metric=mx.metric.Loss(),
                       num_epoch=10)
   ```
   I assume there is some mistake with my usage of MakeLoss or my iterator doesnt work the way I expect. Maybe someone has got an idea what's causing this.
   
   If you need more information let me know.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services