You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/12/27 07:46:16 UTC
[GitHub] [incubator-mxnet] liuzh91 commented on issue #17086: [MKLDNN] RNN Op gradient computation is broken

liuzh91 commented on issue #17086: [MKLDNN] RNN Op gradient computation is broken
URL: https://github.com/apache/incubator-mxnet/issues/17086#issuecomment-569213451
 
 
   > Hi, @liuzh91 @szhengac. We have posted #17183 to fix the gradient explosion issue in RNN Backward. Thanks for reporting this issue again. And it would be greatly appreciated if you could give a test on this patch. Thanks.
   > 
   > BTW, we got the below training log:
   > 
   > ```
   > ❯ python word_language_model.py --log-interval=1
   > /path/to/mxnet/python/mxnet/optimizer/optimizer.py:167: UserWarning: WARNING: New optimizer gluonnlp.optimizer.lamb.LAMB is overriding existing optimizer mxnet.optimizer.optimizer.LAMB
   >   Optimizer.opt_registry[name].__name__))
   > Namespace(alpha=2, batch_size=80, beta=1, bptt=70, clip=0.25, dropout=0.4, dropout_e=0.1, dropout_h=0.2, dropout_i=0.65, emsize=400, epochs=750, eval_only=False, gpu=None, log_interval=1, lr=30, lr_update_factor=0.1, lr_update_interval=30, model='lstm', nhid=1150, nlayers=3, ntasgd=False, optimizer='sgd', save='model.params', test_mode=False, tied=False, wd=1.2e-06, weight_dropout=0.5)
   > Use AWDRNN
   > AWDRNN(
   >   (embedding): HybridSequential(
   >     (0): Embedding(33278 -> 400, float32)
   >     (1): Dropout(p = 0.65, axes=(0,))
   >   )
   >   (encoder): HybridSequential(
   >     (0): LSTM(400 -> 1150, TNC)
   >     (1): LSTM(1150 -> 1150, TNC)
   >     (2): LSTM(1150 -> 1150, TNC)
   >   )
   >   (decoder): HybridSequential(
   >     (0): Dense(None -> 33278, linear)
   >   )
   > )
   > [Epoch 0 Batch 1/372] current loss 20.50, ppl 796977445.38, throughput 18.37 samples/s, lr 30.86
   > [Epoch 0 Batch 2/372] current loss 9.51, ppl 13511.50, throughput 39.56 samples/s, lr 28.29
   > [Epoch 0 Batch 3/372] current loss 17.53, ppl 41003388.51, throughput 40.65 samples/s, lr 27.43
   > [Epoch 0 Batch 4/372] current loss 9.45, ppl 12761.47, throughput 40.39 samples/s, lr 27.43
   > [Epoch 0 Batch 5/372] current loss 14.34, ppl 1695623.66, throughput 35.59 samples/s, lr 31.71
   > [Epoch 0 Batch 6/372] current loss 9.40, ppl 12113.46, throughput 35.10 samples/s, lr 32.14
   > [Epoch 0 Batch 7/372] current loss 8.56, ppl 5232.00, throughput 37.62 samples/s, lr 30.00
   > [Epoch 0 Batch 8/372] current loss 9.32, ppl 11163.67, throughput 42.00 samples/s, lr 26.57
   > [Epoch 0 Batch 9/372] current loss 8.44, ppl 4642.37, throughput 61.95 samples/s, lr 17.14
   > [Epoch 0 Batch 10/372] current loss 8.92, ppl 7494.76, throughput 41.39 samples/s, lr 27.00
   > ```
   
   Thank you for the commit. We will double check whether the bug still persists.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services