You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by gi...@git.apache.org on 2017/08/27 20:08:42 UTC

[GitHub] atiyo opened a new issue #7637: Strange Validation and Training Losses at epoch change

atiyo opened a new issue #7637: Strange Validation and Training Losses at epoch change
URL: https://github.com/apache/incubator-mxnet/issues/7637
 
 
   I struggled to get some mxnet models training to a good accuracy, so I took a closer look at training and validation losses of a toy model. I noticed some strange spikes between epochs, which surprised me.
   
   I expect I'm likely doing something wrong, but I can't see what: I have tried several optimisers, with learning rates spanning several orders of magnitude. It's most plausible that I'm doing something drastically wrong, being new to mxnet.
   
   Graphic below illustrating the phenomenon, and also code to reproduce figure:
   
   ![adam_loss](https://user-images.githubusercontent.com/12828061/29753519-35b2e6e2-8b6b-11e7-8c08-14b8730efceb.png)
   
   
   ```
   import mxnet as mx
   import numpy as np
   
   optimizer_choice = 'adam'
   learning_rate = 0.01
   batch_size = 500
   
   inputs = np.expand_dims(np.random.uniform(low=0., high=2*np.pi, size=10000), axis=1)
   labels = np.sin(inputs)
   
   eval_inputs = np.expand_dims(np.random.uniform(low=0., high=2*np.pi, size=10000), axis=1)
   eval_labels = np.sin(eval_inputs)
   
   data_iter = mx.io.NDArrayIter(data={'data':inputs}, label={'label':labels}, batch_size=batch_size, shuffle=True)
   eval_data_iter = mx.io.NDArrayIter(data={'data':eval_inputs}, label={'label':eval_labels}, batch_size=batch_size, shuffle=True)
   
   data = mx.sym.Variable('data')
   label = mx.sym.Variable('label')
   fc1 = mx.sym.FullyConnected(data=data, num_hidden=128)
   ac1 = mx.sym.Activation(data=fc1, act_type='relu')
   fc2 = mx.sym.FullyConnected(data=ac1, num_hidden=64)
   ac2 = mx.sym.Activation(data=fc2, act_type='relu')
   fc3 = mx.sym.FullyConnected(data=ac2, num_hidden=16)
   ac3 = mx.sym.Activation(data=fc3, act_type='relu')
   fc4 = mx.sym.FullyConnected(data=ac3, num_hidden=1)
   ac4 = mx.sym.Activation(data=fc4, act_type='tanh')
   loss = mx.symbol.LinearRegressionOutput(data=ac4, label=label)
   net = mx.module.Module(symbol=loss, data_names=['data'], label_names=['label'])
   
   train_error = [] 
   eval_error = []
   def log_error(period, log):
       def _callback(param):
           if param.nbatch % period == 0:
               name, value = param.eval_metric.get()
               log.append(value)
       return _callback
       
   optimizer_params={'learning_rate':learning_rate}
   net.fit(data_iter,
         optimizer=optimizer_choice,
         optimizer_params=optimizer_params,
         eval_data=eval_data_iter,
         eval_metric='mse',
         num_epoch=5,
         epoch_end_callback = mx.callback.do_checkpoint('test_net'),
         eval_batch_end_callback = log_error(1,eval_error),
         batch_end_callback = log_error(1,train_error)
         )
   
   train_error = np.array(train_error)
   eval_error = np.array(eval_error)
   import matplotlib.pyplot as plt
   plt.plot(np.arange(train_error.size),train_error, label = 'Training Error')
   plt.plot(np.arange(eval_error.size), eval_error, label = 'Validation Error')
   plt.legend(loc='upper right')
   plt.xlabel('Batch Number')
   plt.ylabel('Error')
   plt.title('Optimizer: {}. Learning Rate: {}'.format(optimizer_choice,learning_rate))
   plt.gca().set_ylim(bottom=0)
   plt.show()
   
   ```
   ## Environment info
   Operating System: macOS
   
   MXNet version: 0.11.0
   
   Python version and distribution: Python 2.7.13
   
   
   
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services