You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/12/02 03:15:16 UTC

[GitHub] [incubator-mxnet] waytrue17 opened a new pull request #19614: [BUGFIX] Fix backward pass for nested CachedOp

waytrue17 opened a new pull request #19614:
URL: https://github.com/apache/incubator-mxnet/pull/19614


   ## Description ##
   MXNet will error out during the backward propagation if CachedOp is present in the computational graph. This blocks the  custom subgraph partitioning #18823 and the dynamic shape ops optimization #18690. 
   
   Currently CachedOp is used as the symbol executor (external CachedOp) to run the whole graph. It is also used as individual operator inside the graph (internal CachedOp) that combines with other operators. For backward prop, the external CachedOp should be invoked through [`CachedOp::Backward()`](https://github.com/apache/incubator-mxnet/blob/11a7903c09b07f741cf81191be77d48fa8f7f584/src/imperative/cached_op.cc#L1016) while the internal CachedOp should be invoked through [`CachedOpBackward()`](https://github.com/apache/incubator-mxnet/blob/11a7903c09b07f741cf81191be77d48fa8f7f584/src/imperative/cached_op.cc#L1130). However, currently both external and internal CachedOps go through [`CachedOp::Backward()`](https://github.com/apache/incubator-mxnet/blob/11a7903c09b07f741cf81191be77d48fa8f7f584/src/imperative/imperative_utils.cc#L91), which causes the issue. This PR introduces a flag `is_executor` to differentiate the way we invoke the two CachedOps.
   ## Checklist ##
   ### Essentials ###
   - [ ] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
   - [ ] Changes are complete (i.e. I finished coding on this PR)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on pull request #19614: [BUGFIX] Fix backward pass for nested CachedOp

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on pull request #19614:
URL: https://github.com/apache/incubator-mxnet/pull/19614#issuecomment-740086515


   @mxnet-bot run ci [edge]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 merged pull request #19614: [BUGFIX] Fix backward pass for nested CachedOp

Posted by GitBox <gi...@apache.org>.
mseth10 merged pull request #19614:
URL: https://github.com/apache/incubator-mxnet/pull/19614


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #19614: [BUGFIX] Fix backward pass for nested CachedOp

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #19614:
URL: https://github.com/apache/incubator-mxnet/pull/19614#discussion_r536620855



##########
File path: src/imperative/cached_op.cc
##########
@@ -327,9 +327,10 @@ bool CachedOp::SetBackwardGraph(
   node_range = {num_forward_nodes, idx.num_nodes()};
   entry_range = {num_forward_entries, idx.num_node_entries()};
 
+  bool contain_dynamic_shape = false;

Review comment:
       why do we need this? this variable is not used in this function.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #19614: [BUGFIX] Fix backward pass for nested CachedOp

Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on pull request #19614:
URL: https://github.com/apache/incubator-mxnet/pull/19614#issuecomment-736961104


   Hey @waytrue17 , Thanks for submitting the PR 
   All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: 
   - To trigger all jobs: @mxnet-bot run ci [all] 
   - To trigger specific jobs: @mxnet-bot run ci [job1, job2] 
   *** 
   **CI supported jobs**: [unix-gpu, clang, sanity, centos-cpu, miscellaneous, website, centos-gpu, windows-cpu, edge, windows-gpu, unix-cpu]
   *** 
   _Note_: 
    Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. 
   All CI tests must pass before the PR can be merged. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on a change in pull request #19614: [BUGFIX] Fix backward pass for nested CachedOp

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on a change in pull request #19614:
URL: https://github.com/apache/incubator-mxnet/pull/19614#discussion_r537733240



##########
File path: src/imperative/cached_op.cc
##########
@@ -327,9 +327,10 @@ bool CachedOp::SetBackwardGraph(
   node_range = {num_forward_nodes, idx.num_nodes()};
   entry_range = {num_forward_entries, idx.num_node_entries()};
 
+  bool contain_dynamic_shape = false;

Review comment:
       This variable is used in `CheckAndInferShape` to handle cases where we have unknown shapes during backward. The CI failed on a test case and adding this variable seems fix it. We also have the same variable existed in `CachedOp::SetForwardGraph`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on pull request #19614: [BUGFIX] Fix backward pass for nested CachedOp

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on pull request #19614:
URL: https://github.com/apache/incubator-mxnet/pull/19614#issuecomment-740262866


   @mxnet-bot run ci [edge]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on pull request #19614: [BUGFIX] Fix backward pass for nested CachedOp

Posted by GitBox <gi...@apache.org>.
mseth10 commented on pull request #19614:
URL: https://github.com/apache/incubator-mxnet/pull/19614#issuecomment-742275843


   This PR fixes #18823 and #19524 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 edited a comment on pull request #19614: [BUGFIX] Fix backward pass for nested CachedOp

Posted by GitBox <gi...@apache.org>.
waytrue17 edited a comment on pull request #19614:
URL: https://github.com/apache/incubator-mxnet/pull/19614#issuecomment-740090186


   @mseth10  Result for gluon-nlp tests:
   > python3 -m pytest --forked --device="cpu" test_attention_cell.py
   
   Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=1825779130 to reproduce.
   =================================================================== test session starts ====================================================================
   platform linux -- Python 3.6.9, pytest-6.1.2, py-1.9.0, pluggy-0.13.1
   rootdir: /home/ubuntu/project/nlp, configfile: pytest.ini
   plugins: forked-1.3.0
   collected 113 items                                                                                                                                        
   
   test_attention_cell.py .................................................................................................................             [100%]
   
   ===================================================================== warnings summary =====================================================================
   ../../master-mx/python/mxnet/optimizer/optimizer.py:163
     /home/ubuntu/project/master-mx/python/mxnet/optimizer/optimizer.py:163: UserWarning: WARNING: New optimizer gluonnlp.optimizer.AdamW is overriding existing optimizer mxnet.optimizer.adamW.AdamW
       Optimizer.opt_registry[name].__name__))
   
   -- Docs: https://docs.pytest.org/en/stable/warnings.html
   ============================================================= 113 passed, 1 warning in 30.48s ==============================================================
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on pull request #19614: [BUGFIX] Fix backward pass for nested CachedOp

Posted by GitBox <gi...@apache.org>.
mseth10 commented on pull request #19614:
URL: https://github.com/apache/incubator-mxnet/pull/19614#issuecomment-739155103


   let's also verify that it's not breaking gluonnlp CI. Can you post the output of the following?
   ```
   git clone -b master https://github.com/dmlc/gluon-nlp.git
   cd gluon-nlp
   python3 -m pip install -U -e .
   cd tests
   python3 -m pip install pytest-forked
   python3 -m pytest --forked --device="cpu" test_attention_cell.py
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on a change in pull request #19614: [BUGFIX] Fix backward pass for nested CachedOp

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on a change in pull request #19614:
URL: https://github.com/apache/incubator-mxnet/pull/19614#discussion_r538046097



##########
File path: src/imperative/cached_op.cc
##########
@@ -327,9 +327,10 @@ bool CachedOp::SetBackwardGraph(
   node_range = {num_forward_nodes, idx.num_nodes()};
   entry_range = {num_forward_entries, idx.num_node_entries()};
 
+  bool contain_dynamic_shape = false;

Review comment:
       Removed the variable and passed CI. Seems we do not need it anymore.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on pull request #19614: [BUGFIX] Fix backward pass for nested CachedOp

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on pull request #19614:
URL: https://github.com/apache/incubator-mxnet/pull/19614#issuecomment-739147006


   > Could you add a testcase that fails with partition_if_dynamic=True prior to the fix?
   
   @leezu Test case added. Please review, thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 edited a comment on pull request #19614: [BUGFIX] Fix backward pass for nested CachedOp

Posted by GitBox <gi...@apache.org>.
waytrue17 edited a comment on pull request #19614:
URL: https://github.com/apache/incubator-mxnet/pull/19614#issuecomment-740090186


   @mseth10  It passed all cases in `test_attention_cell.py`, here is the output:
   > python3 -m pytest --forked --device="cpu" test_attention_cell.py
   
   Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=1825779130 to reproduce.
   =================================================================== test session starts ====================================================================
   platform linux -- Python 3.6.9, pytest-6.1.2, py-1.9.0, pluggy-0.13.1
   rootdir: /home/ubuntu/project/nlp, configfile: pytest.ini
   plugins: forked-1.3.0
   collected 113 items                                                                                                                                        
   
   test_attention_cell.py .................................................................................................................             [100%]
   
   ===================================================================== warnings summary =====================================================================
   ../../master-mx/python/mxnet/optimizer/optimizer.py:163
     /home/ubuntu/project/master-mx/python/mxnet/optimizer/optimizer.py:163: UserWarning: WARNING: New optimizer gluonnlp.optimizer.AdamW is overriding existing optimizer mxnet.optimizer.adamW.AdamW
       Optimizer.opt_registry[name].__name__))
   
   -- Docs: https://docs.pytest.org/en/stable/warnings.html
   ============================================================= 113 passed, 1 warning in 30.48s ==============================================================
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on pull request #19614: [BUGFIX] Fix backward pass for nested CachedOp

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on pull request #19614:
URL: https://github.com/apache/incubator-mxnet/pull/19614#issuecomment-736966577


   @mseth10 @sxjscience Can you please review? thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #19614: [BUGFIX] Fix backward pass for nested CachedOp

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #19614:
URL: https://github.com/apache/incubator-mxnet/pull/19614#discussion_r537736621



##########
File path: src/imperative/cached_op.cc
##########
@@ -327,9 +327,10 @@ bool CachedOp::SetBackwardGraph(
   node_range = {num_forward_nodes, idx.num_nodes()};
   entry_range = {num_forward_entries, idx.num_node_entries()};
 
+  bool contain_dynamic_shape = false;

Review comment:
       CheckAndInferShape sets that variable if passed. In SetForwardGraph, the variable is used later in the same function.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on a change in pull request #19614: [BUGFIX] Fix backward pass for nested CachedOp

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on a change in pull request #19614:
URL: https://github.com/apache/incubator-mxnet/pull/19614#discussion_r538046097



##########
File path: src/imperative/cached_op.cc
##########
@@ -327,9 +327,10 @@ bool CachedOp::SetBackwardGraph(
   node_range = {num_forward_nodes, idx.num_nodes()};
   entry_range = {num_forward_entries, idx.num_node_entries()};
 
+  bool contain_dynamic_shape = false;

Review comment:
       Removed the variable and passed CI. Seems we do not need it anymore in our latest fix.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #19614: [BUGFIX] Fix backward pass for nested CachedOp

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #19614:
URL: https://github.com/apache/incubator-mxnet/pull/19614#discussion_r538482004



##########
File path: src/imperative/cached_op.cc
##########
@@ -327,9 +327,10 @@ bool CachedOp::SetBackwardGraph(
   node_range = {num_forward_nodes, idx.num_nodes()};
   entry_range = {num_forward_entries, idx.num_node_entries()};
 
+  bool contain_dynamic_shape = false;

Review comment:
       update PR description




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #19614: [BUGFIX] Fix backward pass for nested CachedOp

Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on pull request #19614:
URL: https://github.com/apache/incubator-mxnet/pull/19614#issuecomment-740262884


   Jenkins CI successfully triggered : [edge]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on pull request #19614: [BUGFIX] Fix backward pass for nested CachedOp

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on pull request #19614:
URL: https://github.com/apache/incubator-mxnet/pull/19614#issuecomment-740090186


   Result for gluon-nlp tests:
   > python3 -m pytest --forked --device="cpu" test_attention_cell.py
   
   Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=1825779130 to reproduce.
   =================================================================== test session starts ====================================================================
   platform linux -- Python 3.6.9, pytest-6.1.2, py-1.9.0, pluggy-0.13.1
   rootdir: /home/ubuntu/project/nlp, configfile: pytest.ini
   plugins: forked-1.3.0
   collected 113 items                                                                                                                                        
   
   test_attention_cell.py .................................................................................................................             [100%]
   
   ===================================================================== warnings summary =====================================================================
   ../../master-mx/python/mxnet/optimizer/optimizer.py:163
     /home/ubuntu/project/master-mx/python/mxnet/optimizer/optimizer.py:163: UserWarning: WARNING: New optimizer gluonnlp.optimizer.AdamW is overriding existing optimizer mxnet.optimizer.adamW.AdamW
       Optimizer.opt_registry[name].__name__))
   
   -- Docs: https://docs.pytest.org/en/stable/warnings.html
   ============================================================= 113 passed, 1 warning in 30.48s ==============================================================
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #19614: [BUGFIX] Fix backward pass for nested CachedOp

Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on pull request #19614:
URL: https://github.com/apache/incubator-mxnet/pull/19614#issuecomment-740086564


   Jenkins CI successfully triggered : [edge]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org