You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/01/21 14:16:12 UTC

[GitHub] wkcn opened a new issue #9511: set_lr_mult() or set_wd_mult() is invalid if not setting param_idx2name for the optimizer

wkcn opened a new issue #9511: set_lr_mult() or set_wd_mult() is invalid if not setting param_idx2name for the optimizer
URL: https://github.com/apache/incubator-mxnet/issues/9511
 
 
   ## Description
   Hi, all.
   
   I found a problem that I have to set **param_idx2name** for the optimizer if I want to **set lr_mult or wd_mult**.
    
   **If not setting param_idx2name, the set_lr_mult() and set_wd_mult() are both invalid, but there is no any prompt**.
   
   However, **it's difficult to define param_idx2name because kvstore and multi-GPU**.
   
   ## Environment info (Required)
   Operation System: Arch Linux 4.14.13
   MXNet: [20fbda6](https://github.com/apache/incubator-mxnet/commit/20fbda6c9d15ba903fc6416baa7eecf79ab38f1b)
   Python: 2.7.14/3.6.4
   
   
   ## Build info (Required if built from source)
   
   Compiler (gcc/clang/mingw/visual studio): gcc
   
   MXNet commit hash:
   20fbda6c9d15ba903fc6416baa7eecf79ab38f1b
   
   Build config:
   ```
   make -j 4 USE_OPENCV=1 USE_BLAS=openblas
   ```
   
   ## Minimum reproducible example
   ```python
   import mxnet as mx
   import logging
   logging.getLogger().setLevel(logging.DEBUG)  # logging to stdout
   
   mnist = mx.test_utils.get_mnist()
   
   batch_size = 100
   train_iter = mx.io.NDArrayIter(mnist['train_data'], mnist['train_label'], batch_size, shuffle=False)
   val_iter = mx.io.NDArrayIter(mnist['test_data'], mnist['test_label'], batch_size)
   
   data = mx.sym.var("data")
   data = mx.sym.flatten(data = data)
   
   fc1 = mx.sym.FullyConnected(data = data, num_hidden = 128)
   act1 = mx.sym.Activation(data = fc1, act_type = "relu")
   fc2 = mx.sym.FullyConnected(data = act1, num_hidden = 64)
   act2 = mx.sym.Activation(data = fc2, act_type = "relu")
   act2 = mx.sym.BatchNorm(data = act2)
   
   fc3 = mx.sym.FullyConnected(data = act2, num_hidden = 10)
   mlp = mx.sym.SoftmaxOutput(data = fc3, name = "softmax")
   
   mlp_model = mx.mod.Module(symbol = mlp, context = mx.cpu())
   
   lr = 0.01
   
   params =  (mlp.list_arguments())
   lr_mult = dict()
   wd_mult = dict()
   idx2name = dict()
   for idx, name in enumerate(params):
       lr_mult[name] = 0
       idx2name[idx] = name
   
   optimizer = mx.optimizer.SGD(learning_rate = lr, 
           momentum = 0.9, 
           wd = 0.0005, 
           rescale_grad = 1.0 / batch_size)
   
   optimizer.set_lr_mult(lr_mult)
   optimizer.set_wd_mult(wd_mult)
   
   mlp_model.fit(train_iter,
                 eval_data = val_iter,
                 optimizer = optimizer,
                 eval_metric = [mx.metric.Accuracy(), mx.metric.CrossEntropy()],
                 batch_end_callback = mx.callback.Speedometer(batch_size, 100),
                 num_epoch = 20)
   ```
   
   ## Steps to reproduce
   
   1. I set lr_mult to 0 but didn't set param_idx2name for the optimizer. The result is wrong and the weights of the network shouldn't be updated because lr_mult is 0.
   ```
   INFO:root:Epoch[0] Batch [100]	Speed: 5695.61 samples/sec	accuracy=0.531386	cross-entropy=1.513995
   INFO:root:Epoch[0] Batch [200]	Speed: 6095.63 samples/sec	accuracy=0.877100	cross-entropy=0.442159
   INFO:root:Epoch[0] Batch [300]	Speed: 5751.52 samples/sec	accuracy=0.921100	cross-entropy=0.281648
   INFO:root:Epoch[0] Batch [400]	Speed: 6200.54 samples/sec	accuracy=0.933200	cross-entropy=0.231324
   INFO:root:Epoch[0] Batch [500]	Speed: 5996.19 samples/sec	accuracy=0.937900	cross-entropy=0.210167
   INFO:root:Epoch[0] Train-accuracy=0.955152
   INFO:root:Epoch[0] Train-cross-entropy=0.149803
   INFO:root:Epoch[0] Time cost=10.007
   INFO:root:Epoch[0] Validation-accuracy=0.950700
   INFO:root:Epoch[0] Validation-cross-entropy=0.161047
   INFO:root:Epoch[1] Batch [100]	Speed: 6367.74 samples/sec	accuracy=0.955644	cross-entropy=0.147375
   INFO:root:Epoch[1] Batch [200]	Speed: 5722.35 samples/sec	accuracy=0.961800	cross-entropy=0.133875
   INFO:root:Epoch[1] Batch [300]	Speed: 5332.16 samples/sec	accuracy=0.965100	cross-entropy=0.116933
   INFO:root:Epoch[1] Batch [400]	Speed: 5303.59 samples/sec	accuracy=0.966900	cross-entropy=0.117010
   INFO:root:Epoch[1] Batch [500]	Speed: 5561.86 samples/sec	accuracy=0.964600	cross-entropy=0.121509
   ```
   
   ## What have you tried to solve it?
   
   
   There is a solution to set param_idx2name manually for the optimizer, however it's difficult to set it especially for the case using multi-gpu.
   
   The [PR](https://github.com/apache/incubator-mxnet/pull/2337/commits/a77d47d5ec93512a3750c82004122cbbc0cab8a2) shows that the definition of param_idx2name.
   
   It seems that **whether or not use kvstore or multi-gpu decides different setting of param_idx2name**.
   
   So I think it's convenient to **set param_idx2name automatically** when the optimizer is initialized in mxnet.module.BaseModule
   
   Here is [the code](https://github.com/wkcn/incubator-mxnet/commit/4e89621c37490bdd03a599d5aa1bf49976fddb2d) I modified.
   
   And the result is right when setting lr_mult 0 and not setting param_idx2name.
   ```
   INFO:root:Epoch[0] Batch [100]	Speed: 5697.48 samples/sec	accuracy=0.079604	cross-entropy=2.302685
   INFO:root:Epoch[0] Batch [200]	Speed: 6142.33 samples/sec	accuracy=0.080000	cross-entropy=2.302679
   INFO:root:Epoch[0] Batch [300]	Speed: 5620.36 samples/sec	accuracy=0.082400	cross-entropy=2.302705
   INFO:root:Epoch[0] Batch [400]	Speed: 5679.43 samples/sec	accuracy=0.084000	cross-entropy=2.302689
   INFO:root:Epoch[0] Batch [500]	Speed: 6029.99 samples/sec	accuracy=0.079000	cross-entropy=2.302701
   INFO:root:Epoch[0] Train-accuracy=0.078586
   INFO:root:Epoch[0] Train-cross-entropy=2.302687
   INFO:root:Epoch[0] Time cost=11.746
   INFO:root:Epoch[0] Validation-accuracy=0.079100
   INFO:root:Epoch[0] Validation-cross-entropy=2.302701
   INFO:root:Epoch[1] Batch [100]	Speed: 2341.08 samples/sec	accuracy=0.079604	cross-entropy=2.302685
   INFO:root:Epoch[1] Batch [200]	Speed: 3169.10 samples/sec	accuracy=0.080000	cross-entropy=2.302679
   INFO:root:Epoch[1] Batch [300]	Speed: 5883.45 samples/sec	accuracy=0.082400	cross-entropy=2.302705
   INFO:root:Epoch[1] Batch [400]	Speed: 5527.54 samples/sec	accuracy=0.084000	cross-entropy=2.302689
   INFO:root:Epoch[1] Batch [500]	Speed: 5744.79 samples/sec	accuracy=0.079000	cross-entropy=2.302701
   ```
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services