You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/07/03 09:03:51 UTC

[GitHub] [incubator-mxnet] tranvanhoa533 opened a new issue #18659: Accumulate gradient with module symbolic api

tranvanhoa533 opened a new issue #18659:
URL: https://github.com/apache/incubator-mxnet/issues/18659


   Hello. I am trying to implement accumulate gradient via simple mlp on mnist dataset.
   
   ### Problem
   My issue is that the model is not learn with _grad_req = 'add', the score is not changing. But when i train model without accumulated gradient ( _grad_req = 'write'), model is trained perfectly
   
   
   ## To Reproduce
   Here is my code:
   ```
   import mxnet as mx
   import os
   import mxnet.optimizer as optimizer
   
   
   data = mx.symbol.Variable('data')
   fc1 = mx.symbol.FullyConnected(data, name='fc1', num_hidden=128)
   act1 = mx.symbol.Activation(fc1, name='relu1', act_type="relu")
   fc2 = mx.symbol.FullyConnected(act1, name = 'fc2', num_hidden = 64)
   act2 = mx.symbol.Activation(fc2, name='relu2', act_type="relu")
   fc3 = mx.symbol.FullyConnected(act2, name='fc3', num_hidden=10)
   softmax = mx.symbol.SoftmaxOutput(fc3, name = 'softmax')
   
   n_epoch = 10
   
   accum_grad_step = 10
   accum_grad = False
   if accum_grad:
       _grad_req = 'add'
       batch_size = 5
   else:
       _grad_req = 'write'
       batch_size = 100
   
   
   train_dataiter = mx.io.MNISTIter(
       image=os.path.join("data", "train-images-idx3-ubyte"),
       label=os.path.join("data", "train-labels-idx1-ubyte"),
       data_shape=(784,),
       batch_size=batch_size, shuffle=True, flat=True, silent=False, seed=10)
   val_dataiter = mx.io.MNISTIter(
       image=os.path.join("data", "t10k-images-idx3-ubyte"),
       label=os.path.join("data", "t10k-labels-idx1-ubyte"),
       data_shape=(784,),
       batch_size=batch_size, shuffle=True, flat=True, silent=False)
   
   
   opt = optimizer.SGD(learning_rate=0.01, momentum=0.9, rescale_grad=0.25)
   
   mod = mx.mod.Module(softmax)
   
   mod.bind(data_shapes=train_dataiter.provide_data, label_shapes=train_dataiter.provide_label, inputs_need_grad=False, grad_req=_grad_req)
   mod.init_params()
   mod.init_optimizer(optimizer=opt)
   metric = mx.metric.create('acc')
   
   for i_epoch in range(n_epoch):
       for i_iter, batch in enumerate(train_dataiter):
           
           mod.forward(batch)
           mod.update_metric(metric, batch.label)
           mod.backward()
   
           if accum_grad:
               if i_iter % accum_grad_step == 0 and i_iter > 0:
                   mod.update()
   
                   for grad in mod._exec_group.grad_arrays[0]:
                       mx.nd.zeros_like(grad, out=grad)
                 
   
           else:
               mod.update()
   
       for name, val in metric.get_name_value():
           print('epoch %03d: %s=%f' % (i_epoch, name, val))
   
       metric.reset()
       train_dataiter.reset()
   
   ```
   
   
   ## Environment
   mxnet-1.6.0
   ubuntu 18.04


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] tranvanhoa533 commented on issue #18659: Accumulated gradient with module symbolic api not work

Posted by GitBox <gi...@apache.org>.
tranvanhoa533 commented on issue #18659:
URL: https://github.com/apache/incubator-mxnet/issues/18659#issuecomment-653986892


   Switching to the Gluon API is time consuming. Can this bug be fixed with the symbolical API, @leezu?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on issue #18659: Accumulated gradient with module symbolic api not work

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #18659:
URL: https://github.com/apache/incubator-mxnet/issues/18659#issuecomment-653988497


   Certainly it can be fixed, but you may need to help investigate the fix yourself if you require a timely fix. That's because the module api is deprecated and has been removed from mxnet master branch. The api will continue to be available in MXNet 1.x versions and a fix for that branch will be very welcome.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] tranvanhoa533 commented on issue #18659: Accumulated gradient with module symbolic api not work

Posted by GitBox <gi...@apache.org>.
tranvanhoa533 commented on issue #18659:
URL: https://github.com/apache/incubator-mxnet/issues/18659#issuecomment-653992492


   How can I investigate it?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] tranvanhoa533 commented on issue #18659: Accumulated gradient with module symbolic api not work

Posted by GitBox <gi...@apache.org>.
tranvanhoa533 commented on issue #18659:
URL: https://github.com/apache/incubator-mxnet/issues/18659#issuecomment-654648681


   Thanks @leezu . It seems very complicated.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on issue #18659: Accumulated gradient with module symbolic api not work

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #18659:
URL: https://github.com/apache/incubator-mxnet/issues/18659#issuecomment-653952983


   Are you able to switch to the Gluon API?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on issue #18659: Accumulated gradient with module symbolic api not work

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #18659:
URL: https://github.com/apache/incubator-mxnet/issues/18659#issuecomment-654001298


   Test if the bug is present on older versions; if not, find the commit that introduced the bug: https://git-scm.com/book/en/v2/Git-Tools-Debugging-with-Git (see binary search at that link); https://git-scm.com/docs/git-bisect


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org