You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by gi...@git.apache.org on 2017/08/27 20:08:42 UTC
[GitHub] atiyo opened a new issue #7637: Strange Validation and Training Losses at epoch change
atiyo opened a new issue #7637: Strange Validation and Training Losses at epoch change
URL: https://github.com/apache/incubator-mxnet/issues/7637
I struggled to get some mxnet models training to a good accuracy, so I took a closer look at training and validation losses of a toy model. I noticed some strange spikes between epochs, which surprised me.
I expect I'm likely doing something wrong, but I can't see what: I have tried several optimisers, with learning rates spanning several orders of magnitude. It's most plausible that I'm doing something drastically wrong, being new to mxnet.
Graphic below illustrating the phenomenon, and also code to reproduce figure:
![adam_loss](https://user-images.githubusercontent.com/12828061/29753519-35b2e6e2-8b6b-11e7-8c08-14b8730efceb.png)
```
import mxnet as mx
import numpy as np
optimizer_choice = 'adam'
learning_rate = 0.01
batch_size = 500
inputs = np.expand_dims(np.random.uniform(low=0., high=2*np.pi, size=10000), axis=1)
labels = np.sin(inputs)
eval_inputs = np.expand_dims(np.random.uniform(low=0., high=2*np.pi, size=10000), axis=1)
eval_labels = np.sin(eval_inputs)
data_iter = mx.io.NDArrayIter(data={'data':inputs}, label={'label':labels}, batch_size=batch_size, shuffle=True)
eval_data_iter = mx.io.NDArrayIter(data={'data':eval_inputs}, label={'label':eval_labels}, batch_size=batch_size, shuffle=True)
data = mx.sym.Variable('data')
label = mx.sym.Variable('label')
fc1 = mx.sym.FullyConnected(data=data, num_hidden=128)
ac1 = mx.sym.Activation(data=fc1, act_type='relu')
fc2 = mx.sym.FullyConnected(data=ac1, num_hidden=64)
ac2 = mx.sym.Activation(data=fc2, act_type='relu')
fc3 = mx.sym.FullyConnected(data=ac2, num_hidden=16)
ac3 = mx.sym.Activation(data=fc3, act_type='relu')
fc4 = mx.sym.FullyConnected(data=ac3, num_hidden=1)
ac4 = mx.sym.Activation(data=fc4, act_type='tanh')
loss = mx.symbol.LinearRegressionOutput(data=ac4, label=label)
net = mx.module.Module(symbol=loss, data_names=['data'], label_names=['label'])
train_error = []
eval_error = []
def log_error(period, log):
def _callback(param):
if param.nbatch % period == 0:
name, value = param.eval_metric.get()
log.append(value)
return _callback
optimizer_params={'learning_rate':learning_rate}
net.fit(data_iter,
optimizer=optimizer_choice,
optimizer_params=optimizer_params,
eval_data=eval_data_iter,
eval_metric='mse',
num_epoch=5,
epoch_end_callback = mx.callback.do_checkpoint('test_net'),
eval_batch_end_callback = log_error(1,eval_error),
batch_end_callback = log_error(1,train_error)
)
train_error = np.array(train_error)
eval_error = np.array(eval_error)
import matplotlib.pyplot as plt
plt.plot(np.arange(train_error.size),train_error, label = 'Training Error')
plt.plot(np.arange(eval_error.size), eval_error, label = 'Validation Error')
plt.legend(loc='upper right')
plt.xlabel('Batch Number')
plt.ylabel('Error')
plt.title('Optimizer: {}. Learning Rate: {}'.format(optimizer_choice,learning_rate))
plt.gca().set_ylim(bottom=0)
plt.show()
```
## Environment info
Operating System: macOS
MXNet version: 0.11.0
Python version and distribution: Python 2.7.13
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services