You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/04/30 06:59:53 UTC

[GitHub] [incubator-mxnet] bgawrych opened a new pull request #18203: Rnn fix

bgawrych opened a new pull request #18203:
URL: https://github.com/apache/incubator-mxnet/pull/18203


   ## Description ##
   Fix for LSTM and GRU layers without DNNL enabled give wrong gradients #17898
   
   For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect.
   Reason of wrong calculations was overwriting y derivative (dy) tensor by
   calculated x derivative (dx) tensor before right2left layer could use dy for own
   gradient calculations.
   
   For GRU with number of layers > 2 i2h_weight gradient for
   layers in the middle (all except last and first) was incorrect.
   Wrong caluculations were caused by assigning output pointer to
   input instead of calculating new input pointer.
   
   ## Checklist ##
   ### Essentials ###
   Please feel free to remove inapplicable items for your PR.
   - [x] Changes are complete (i.e. I finished coding on this PR)
   - [x] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] ciyongch commented on pull request #18203: Fix LSTM and GRU layers gradient calculations

Posted by GitBox <gi...@apache.org>.
ciyongch commented on pull request #18203:
URL: https://github.com/apache/incubator-mxnet/pull/18203#issuecomment-628346080


   @bgawrych , please cherry-pick to both v1.7.x and v1.x branch (as these two branches are diverged now), so that this patch will be included in the upcoming 1.7.0 release.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] ciyongch commented on a change in pull request #18203: Fix LSTM and GRU layers gradient calculations

Posted by GitBox <gi...@apache.org>.
ciyongch commented on a change in pull request #18203:
URL: https://github.com/apache/incubator-mxnet/pull/18203#discussion_r420529026



##########
File path: src/operator/rnn_impl.h
##########
@@ -594,6 +595,10 @@ void LstmBackward(DType* ws,
                                      x, hx[idx], cx[idx], y, dy, dx, dhx[idx], dcx[idx],
                                      dhy_cur_ptr, dcy_cur_ptr, w_cur_ptr, dw_cur_ptr, db_cur_ptr,
                                      req_data, req_params, req_state, req_statecell);
+
+      // Prevent overwritting dy while calculating dx in left2right layer
+      const int loop_iteration = (L - 1) - i;

Review comment:
       Can you add an additional case which contains even number of layers in UT (the current UT is only covering 1 and 3 layer(s))?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] pengzhao-intel commented on pull request #18203: Fix LSTM and GRU layers gradient calculations

Posted by GitBox <gi...@apache.org>.
pengzhao-intel commented on pull request #18203:
URL: https://github.com/apache/incubator-mxnet/pull/18203#issuecomment-624416707


   @zixuanweeei  @ciyongch  please help take a review :) 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] pengzhao-intel commented on pull request #18203: Fix LSTM and GRU layers gradient calculations

Posted by GitBox <gi...@apache.org>.
pengzhao-intel commented on pull request #18203:
URL: https://github.com/apache/incubator-mxnet/pull/18203#issuecomment-624421723


   @bgawrych  let me know when all internal cases and performance test passed and then I will merge the PR 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] zixuanweeei commented on pull request #18203: Rnn fix

Posted by GitBox <gi...@apache.org>.
zixuanweeei commented on pull request #18203:
URL: https://github.com/apache/incubator-mxnet/pull/18203#issuecomment-621662624


   Thanks for your contribution. CC @pengzhao-intel @TaoLv @ciyongch 
   
   @bgawrych Could you please rename the title to a more concise and specific one? When the PR merged, the title will become the commit message. Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] pengzhao-intel commented on a change in pull request #18203: Fix LSTM and GRU layers gradient calculations

Posted by GitBox <gi...@apache.org>.
pengzhao-intel commented on a change in pull request #18203:
URL: https://github.com/apache/incubator-mxnet/pull/18203#discussion_r417904959



##########
File path: src/operator/rnn-inl.h
##########
@@ -195,7 +195,9 @@ inline size_t GetRNNWorkspaceSize(index_t seq_length,
     case rnn_enum::kLstm:
       size = seq_length * batch_size * hidden_size * (4 + direction) +  // wx*x + inter-y
           batch_size * hidden_size * 6 +                                // wh*h + h + c
-          seq_length * hidden_size * 8;                    // Used in Backward, Δbx, Δbh
+          seq_length * hidden_size * 8 +                   // Used in Backward, Δbx, Δbh
+          // temporary dy in backward computation for bidirectional layers

Review comment:
       Add TODO (your name) to track this issue




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] pengzhao-intel merged pull request #18203: Fix LSTM and GRU layers gradient calculations

Posted by GitBox <gi...@apache.org>.
pengzhao-intel merged pull request #18203:
URL: https://github.com/apache/incubator-mxnet/pull/18203


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] bgawrych commented on pull request #18203: Rnn fix

Posted by GitBox <gi...@apache.org>.
bgawrych commented on pull request #18203:
URL: https://github.com/apache/incubator-mxnet/pull/18203#issuecomment-621664013


   > Thanks for your contribution. CC @pengzhao-intel @TaoLv @ciyongch
   > 
   > @bgawrych Could you please rename the title to a more concise and specific one? When the PR merged, the title will become the commit message. Thanks.
   
   Of course, I overlooked this. Sorry :)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] pengzhao-intel commented on pull request #18203: Fix LSTM and GRU layers gradient calculations

Posted by GitBox <gi...@apache.org>.
pengzhao-intel commented on pull request #18203:
URL: https://github.com/apache/incubator-mxnet/pull/18203#issuecomment-621743099


   Thanks @bgawrych yes, we need to cherry pick this to 1.x. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] pengzhao-intel commented on pull request #18203: Fix LSTM and GRU layers gradient calculations

Posted by GitBox <gi...@apache.org>.
pengzhao-intel commented on pull request #18203:
URL: https://github.com/apache/incubator-mxnet/pull/18203#issuecomment-628344017


   Please also cherry-pick to 1.x branch.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #18203: Rnn fix

Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on pull request #18203:
URL: https://github.com/apache/incubator-mxnet/pull/18203#issuecomment-621651215


   Hey @bgawrych , Thanks for submitting the PR 
   All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: 
   - To trigger all jobs: @mxnet-bot run ci [all] 
   - To trigger specific jobs: @mxnet-bot run ci [job1, job2] 
   *** 
   **CI supported jobs**: [miscellaneous, sanity, clang, website, windows-cpu, centos-gpu, centos-cpu, unix-cpu, unix-gpu, edge, windows-gpu]
   *** 
   _Note_: 
    Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. 
   All CI tests must pass before the PR can be merged. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] bgawrych commented on a change in pull request #18203: Fix LSTM and GRU layers gradient calculations

Posted by GitBox <gi...@apache.org>.
bgawrych commented on a change in pull request #18203:
URL: https://github.com/apache/incubator-mxnet/pull/18203#discussion_r417909239



##########
File path: src/operator/rnn-inl.h
##########
@@ -195,7 +195,9 @@ inline size_t GetRNNWorkspaceSize(index_t seq_length,
     case rnn_enum::kLstm:
       size = seq_length * batch_size * hidden_size * (4 + direction) +  // wx*x + inter-y
           batch_size * hidden_size * 6 +                                // wh*h + h + c
-          seq_length * hidden_size * 8;                    // Used in Backward, Δbx, Δbh
+          seq_length * hidden_size * 8 +                   // Used in Backward, Δbx, Δbh
+          // temporary dy in backward computation for bidirectional layers

Review comment:
       what i meant here is temporary space required to derivative in r2l layer - not temporary solution

##########
File path: src/operator/rnn-inl.h
##########
@@ -195,7 +195,9 @@ inline size_t GetRNNWorkspaceSize(index_t seq_length,
     case rnn_enum::kLstm:
       size = seq_length * batch_size * hidden_size * (4 + direction) +  // wx*x + inter-y
           batch_size * hidden_size * 6 +                                // wh*h + h + c
-          seq_length * hidden_size * 8;                    // Used in Backward, Δbx, Δbh
+          seq_length * hidden_size * 8 +                   // Used in Backward, Δbx, Δbh
+          // temporary dy in backward computation for bidirectional layers

Review comment:
       what i meant here is temporary space required for derivative in r2l layer - not temporary solution




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org