You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/01/21 14:16:12 UTC
[GitHub] wkcn opened a new issue #9511: set_lr_mult() or set_wd_mult() is invalid if not setting param_idx2name for the optimizer
wkcn opened a new issue #9511: set_lr_mult() or set_wd_mult() is invalid if not setting param_idx2name for the optimizer
URL: https://github.com/apache/incubator-mxnet/issues/9511
## Description
Hi, all.
I found a problem that I have to set **param_idx2name** for the optimizer if I want to **set lr_mult or wd_mult**.
**If not setting param_idx2name, the set_lr_mult() and set_wd_mult() are both invalid, but there is no any prompt**.
However, **it's difficult to define param_idx2name because kvstore and multi-GPU**.
## Environment info (Required)
Operation System: Arch Linux 4.14.13
MXNet: [20fbda6](https://github.com/apache/incubator-mxnet/commit/20fbda6c9d15ba903fc6416baa7eecf79ab38f1b)
Python: 2.7.14/3.6.4
## Build info (Required if built from source)
Compiler (gcc/clang/mingw/visual studio): gcc
MXNet commit hash:
20fbda6c9d15ba903fc6416baa7eecf79ab38f1b
Build config:
```
make -j 4 USE_OPENCV=1 USE_BLAS=openblas
```
## Minimum reproducible example
```python
import mxnet as mx
import logging
logging.getLogger().setLevel(logging.DEBUG) # logging to stdout
mnist = mx.test_utils.get_mnist()
batch_size = 100
train_iter = mx.io.NDArrayIter(mnist['train_data'], mnist['train_label'], batch_size, shuffle=False)
val_iter = mx.io.NDArrayIter(mnist['test_data'], mnist['test_label'], batch_size)
data = mx.sym.var("data")
data = mx.sym.flatten(data = data)
fc1 = mx.sym.FullyConnected(data = data, num_hidden = 128)
act1 = mx.sym.Activation(data = fc1, act_type = "relu")
fc2 = mx.sym.FullyConnected(data = act1, num_hidden = 64)
act2 = mx.sym.Activation(data = fc2, act_type = "relu")
act2 = mx.sym.BatchNorm(data = act2)
fc3 = mx.sym.FullyConnected(data = act2, num_hidden = 10)
mlp = mx.sym.SoftmaxOutput(data = fc3, name = "softmax")
mlp_model = mx.mod.Module(symbol = mlp, context = mx.cpu())
lr = 0.01
params = (mlp.list_arguments())
lr_mult = dict()
wd_mult = dict()
idx2name = dict()
for idx, name in enumerate(params):
lr_mult[name] = 0
idx2name[idx] = name
optimizer = mx.optimizer.SGD(learning_rate = lr,
momentum = 0.9,
wd = 0.0005,
rescale_grad = 1.0 / batch_size)
optimizer.set_lr_mult(lr_mult)
optimizer.set_wd_mult(wd_mult)
mlp_model.fit(train_iter,
eval_data = val_iter,
optimizer = optimizer,
eval_metric = [mx.metric.Accuracy(), mx.metric.CrossEntropy()],
batch_end_callback = mx.callback.Speedometer(batch_size, 100),
num_epoch = 20)
```
## Steps to reproduce
1. I set lr_mult to 0 but didn't set param_idx2name for the optimizer. The result is wrong and the weights of the network shouldn't be updated because lr_mult is 0.
```
INFO:root:Epoch[0] Batch [100] Speed: 5695.61 samples/sec accuracy=0.531386 cross-entropy=1.513995
INFO:root:Epoch[0] Batch [200] Speed: 6095.63 samples/sec accuracy=0.877100 cross-entropy=0.442159
INFO:root:Epoch[0] Batch [300] Speed: 5751.52 samples/sec accuracy=0.921100 cross-entropy=0.281648
INFO:root:Epoch[0] Batch [400] Speed: 6200.54 samples/sec accuracy=0.933200 cross-entropy=0.231324
INFO:root:Epoch[0] Batch [500] Speed: 5996.19 samples/sec accuracy=0.937900 cross-entropy=0.210167
INFO:root:Epoch[0] Train-accuracy=0.955152
INFO:root:Epoch[0] Train-cross-entropy=0.149803
INFO:root:Epoch[0] Time cost=10.007
INFO:root:Epoch[0] Validation-accuracy=0.950700
INFO:root:Epoch[0] Validation-cross-entropy=0.161047
INFO:root:Epoch[1] Batch [100] Speed: 6367.74 samples/sec accuracy=0.955644 cross-entropy=0.147375
INFO:root:Epoch[1] Batch [200] Speed: 5722.35 samples/sec accuracy=0.961800 cross-entropy=0.133875
INFO:root:Epoch[1] Batch [300] Speed: 5332.16 samples/sec accuracy=0.965100 cross-entropy=0.116933
INFO:root:Epoch[1] Batch [400] Speed: 5303.59 samples/sec accuracy=0.966900 cross-entropy=0.117010
INFO:root:Epoch[1] Batch [500] Speed: 5561.86 samples/sec accuracy=0.964600 cross-entropy=0.121509
```
## What have you tried to solve it?
There is a solution to set param_idx2name manually for the optimizer, however it's difficult to set it especially for the case using multi-gpu.
The [PR](https://github.com/apache/incubator-mxnet/pull/2337/commits/a77d47d5ec93512a3750c82004122cbbc0cab8a2) shows that the definition of param_idx2name.
It seems that **whether or not use kvstore or multi-gpu decides different setting of param_idx2name**.
So I think it's convenient to **set param_idx2name automatically** when the optimizer is initialized in mxnet.module.BaseModule
Here is [the code](https://github.com/wkcn/incubator-mxnet/commit/4e89621c37490bdd03a599d5aa1bf49976fddb2d) I modified.
And the result is right when setting lr_mult 0 and not setting param_idx2name.
```
INFO:root:Epoch[0] Batch [100] Speed: 5697.48 samples/sec accuracy=0.079604 cross-entropy=2.302685
INFO:root:Epoch[0] Batch [200] Speed: 6142.33 samples/sec accuracy=0.080000 cross-entropy=2.302679
INFO:root:Epoch[0] Batch [300] Speed: 5620.36 samples/sec accuracy=0.082400 cross-entropy=2.302705
INFO:root:Epoch[0] Batch [400] Speed: 5679.43 samples/sec accuracy=0.084000 cross-entropy=2.302689
INFO:root:Epoch[0] Batch [500] Speed: 6029.99 samples/sec accuracy=0.079000 cross-entropy=2.302701
INFO:root:Epoch[0] Train-accuracy=0.078586
INFO:root:Epoch[0] Train-cross-entropy=2.302687
INFO:root:Epoch[0] Time cost=11.746
INFO:root:Epoch[0] Validation-accuracy=0.079100
INFO:root:Epoch[0] Validation-cross-entropy=2.302701
INFO:root:Epoch[1] Batch [100] Speed: 2341.08 samples/sec accuracy=0.079604 cross-entropy=2.302685
INFO:root:Epoch[1] Batch [200] Speed: 3169.10 samples/sec accuracy=0.080000 cross-entropy=2.302679
INFO:root:Epoch[1] Batch [300] Speed: 5883.45 samples/sec accuracy=0.082400 cross-entropy=2.302705
INFO:root:Epoch[1] Batch [400] Speed: 5527.54 samples/sec accuracy=0.084000 cross-entropy=2.302689
INFO:root:Epoch[1] Batch [500] Speed: 5744.79 samples/sec accuracy=0.079000 cross-entropy=2.302701
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services