You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/07/10 20:40:10 UTC

[GitHub] [incubator-mxnet] mseth10 opened a new pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

mseth10 opened a new pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690


   ## Description ##
   (Brief description on what this PR is about)
   
   ## Checklist ##
   ### Essentials ###
   Please feel free to remove inapplicable items for your PR.
   - [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes)
   - [ ] Changes are complete (i.e. I finished coding on this PR)
   - [ ] All changes have test coverage:
   - Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
   - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
   - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
   - [ ] Code is well-documented: 
   - For user-facing API changes, API doc string has been updated. 
   - For new C++ functions in header files, their functionalities and arguments are documented. 
   - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
   - Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
   - [ ] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
   
   ### Changes ###
   - [ ] Feature1, tests, (and when applicable, API doc)
   - [ ] Feature2, tests, (and when applicable, API doc)
   
   ## Comments ##
   - If this change is a backward incompatible change, why must this change be made.
   - Interesting edge cases to note here
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r512991795



##########
File path: include/mxnet/base.h
##########
@@ -532,6 +532,9 @@ inline std::ostream& operator<<(std::ostream &out, const Context &ctx) {
   return out;
 }
 
+// convert nnvm symbol to a nnvm graph.
+nnvm::Graph Symbol2Graph(const nnvm::Symbol &s);

Review comment:
       yes




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r512859942



##########
File path: include/mxnet/base.h
##########
@@ -532,6 +532,9 @@ inline std::ostream& operator<<(std::ostream &out, const Context &ctx) {
   return out;
 }
 
+// convert nnvm symbol to a nnvm graph.
+nnvm::Graph Symbol2Graph(const nnvm::Symbol &s);

Review comment:
       still planning to remove this?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r464748849



##########
File path: python/mxnet/symbol/symbol.py
##########
@@ -2665,11 +2665,30 @@ def _optimize_for_dynamic_shape_op(self, is_np_array):
             - If true, output type is np symbol, otherwise nd symbol.
         """
         out = SymbolHandle()
-        check_call(_LIB.MXOptimizeForDynamicShapeOp(self.handle, ctypes.byref(out)))
+        has_dynamic_shape = ctypes.c_bool(False)
+
+        # Pass static_allco and static_shape flags into c_api
+        key_list = []
+        val_list = []
+        for flag in flags:
+            key, val = flag[0], flag[1]
+            if key == ['static_alloc', 'static_shape']:

Review comment:
       shouldnt it be `if key in ['static_alloc', 'static_shape']:`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r456130118



##########
File path: python/mxnet/symbol/symbol.py
##########
@@ -1470,6 +1470,10 @@ def optimize_for(self, backend, args=None, aux=None, ctx=None,
         ctx : Context, optional
             Device context, used to infer stypes
 
+        is_np_sym : boolean, optional
+            Output symbol type
+            - If true, output type is np symbol, otherwise nd symbol.
+

Review comment:
       nd symbol will be removed soon anyways. Can you rely on the global `is_np` state (and if needed adapt the state in the `build_cache` function)

##########
File path: python/mxnet/symbol/symbol.py
##########
@@ -2627,6 +2633,24 @@ def detach(self):
     def backward(self):
         raise NotImplementedForSymbol(self.backward, None)
 
+    def optimize_for_dynamic_shape_op(self, is_np_sym=False):

Review comment:
       Why is this a public API even though it's called automatically in `_build_cache`? Should it be private?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r460591522



##########
File path: src/operator/subgraph/build_subgraph.cc
##########
@@ -360,6 +405,17 @@ void SelectSubgraphNodes(nnvm::Graph* g, SubgraphSelectorV2Ptr subgraph_selector
     // filter out unqualified pre-selected nodes
     std::vector<BiDirectedNode*> filtered_nodes = subgraph_selector->Filter(preselected_nodes);
 
+    const SubgraphPropertyPtr& subg_prop = g->GetAttr<SubgraphPropertyPtr>("subgraph_property");
+    if (subg_prop->HasAttr("ensure_CachedOp_input")
+        && subg_prop->GetAttr<bool>("ensure_CachedOp_input")) {
+      // check if subgraph has external input.
+      // if not, reject the first op (in top order) from the subgraph
+      // to make sure CachedOp gets external input.
+      if (filtered_nodes.size() > 0 && !HasInputEntries(*g, simple_nodes, filtered_nodes)) {
+        filtered_nodes.erase(filtered_nodes.begin());

Review comment:
       why are we calling `Filter` twice? Can we move `ensure_CachedOp_input` before the `Filter` call and check for Input Entries in `PreSelectSubgraphNodes`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 edited a comment on pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
waytrue17 edited a comment on pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#issuecomment-665993052


   > Thanks @mseth10 for investigating the root cause. Could you add a new tests that contains the `reshape` after dynamic shape?
   
   Test case added. Thanks for your suggestions and help on this issue
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r460585378



##########
File path: src/operator/subgraph/build_subgraph.cc
##########
@@ -360,6 +405,17 @@ void SelectSubgraphNodes(nnvm::Graph* g, SubgraphSelectorV2Ptr subgraph_selector
     // filter out unqualified pre-selected nodes
     std::vector<BiDirectedNode*> filtered_nodes = subgraph_selector->Filter(preselected_nodes);
 
+    const SubgraphPropertyPtr& subg_prop = g->GetAttr<SubgraphPropertyPtr>("subgraph_property");
+    if (subg_prop->HasAttr("ensure_CachedOp_input")
+        && subg_prop->GetAttr<bool>("ensure_CachedOp_input")) {
+      // check if subgraph has external input.
+      // if not, reject the first op (in top order) from the subgraph
+      // to make sure CachedOp gets external input.
+      if (filtered_nodes.size() > 0 && !HasInputEntries(*g, simple_nodes, filtered_nodes)) {
+        filtered_nodes.erase(filtered_nodes.begin());

Review comment:
       Thanks for bringing this up! I agreed that we should run the filter function after removing first node. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r472542081



##########
File path: python/mxnet/gluon/block.py
##########
@@ -911,6 +911,8 @@ def __init__(self):
         self._monitor_all = False
         self._backend = None
         self._backend_opts = {}
+        self._final_partitioned = False
+        self._cached_op_args = []

Review comment:
       _final_partitioned is a flag used to ensure that we only partition once, it's set in call cachedOp




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r512901227



##########
File path: src/operator/subgraph/build_subgraph.cc
##########
@@ -329,7 +368,21 @@ void PreSelectSubgraphNodes(const nnvm::Graph& g, SubgraphSelectorV2Ptr subgraph
     }
     ++count;
   }
-  if (!success) {
+  if (success) {
+    // CachedOp requires external input during forward pass
+    // check subgraph input. If none, reject the first op (in top order) from the subgraph
+    // to make sure CachedOp gets external input.
+    // this feature can be switched off by setting ensure_CachedOp_input to false
+    const SubgraphPropertyPtr& subg_prop = g.GetAttr<SubgraphPropertyPtr>("subgraph_property");
+    if (subg_prop->HasAttr("ensure_CachedOp_input")
+        && subg_prop->GetAttr<bool>("ensure_CachedOp_input")) {
+      if (subgraph_nodes->size() > 0 && !HasInputEntries(g, simple_nodes, *subgraph_nodes)) {
+        // relabel the node to -1
+        (*subgraph_nodes)[0]->label = -1;
+        subgraph_nodes->erase(subgraph_nodes->begin());

Review comment:
       Can this check be done inside the `Filter` API of the subgraph property? And then reject the single node if needed there?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r512978904



##########
File path: python/mxnet/gluon/block.py
##########
@@ -1094,8 +1094,57 @@ def _deferred_infer_shape(self, *args):
             raise ValueError(error_msg)
 
     def _call_cached_op(self, *args):
-        if self._cached_op is None:
+        if not self._cached_op_args or not self._cached_graph:
             self._build_cache(*args)
+
+        if self._first_forward:
+            # partition static shape ops if the graph contains any dynamic shape op
+            self._first_forward = False
+            data, out = self._cached_graph
+            out, is_dynamic = out._optimize_for_dynamic_shape_op(is_np_array(), self._flags)
+            self._cached_graph = data, out

Review comment:
       Right, we forgot to remove this line after the dynamic shape partitioning. Will address it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r512994213



##########
File path: include/mxnet/base.h
##########
@@ -532,6 +532,9 @@ inline std::ostream& operator<<(std::ostream &out, const Context &ctx) {
   return out;
 }
 
+// convert nnvm symbol to a nnvm graph.
+nnvm::Graph Symbol2Graph(const nnvm::Symbol &s);

Review comment:
       removed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r512864125



##########
File path: python/mxnet/gluon/block.py
##########
@@ -1094,8 +1094,57 @@ def _deferred_infer_shape(self, *args):
             raise ValueError(error_msg)
 
     def _call_cached_op(self, *args):
-        if self._cached_op is None:
+        if not self._cached_op_args or not self._cached_graph:
             self._build_cache(*args)
+
+        if self._first_forward:
+            # partition static shape ops if the graph contains any dynamic shape op
+            self._first_forward = False
+            data, out = self._cached_graph
+            out, is_dynamic = out._optimize_for_dynamic_shape_op(is_np_array(), self._flags)

Review comment:
       are you still planning on replacing this with `optimize_for` instead of creating your own Python API binding?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r472540805



##########
File path: python/mxnet/gluon/block.py
##########
@@ -911,6 +911,8 @@ def __init__(self):
         self._monitor_all = False
         self._backend = None
         self._backend_opts = {}
+        self._final_partitioned = False
+        self._cached_op_args = []

Review comment:
       What’s this for and where is it being set?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on a change in pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r520987207



##########
File path: tests/python/unittest/test_dynamic_shape.py
##########
@@ -47,3 +47,26 @@ def hybrid_forward(self, F, data, index):
     assert_almost_equal(result.asnumpy(), result_nd)
     assert_almost_equal(data.grad.asnumpy(), data_grad_nd)
 
+def test_dynamic_shape_with_reshape():
+    # test dynamic shape op followed by reshape op
+    class _TestBlock(gluon.HybridBlock):
+
+        def __init__(self):
+            super(_TestBlock, self).__init__()
+
+        def hybrid_forward(self, F, data, index):
+            return F.contrib.boolean_mask(data, index).reshape((-1, ))

Review comment:
       @szha added more test cases for multiple hybridize calls




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r459832436



##########
File path: src/operator/subgraph/static_shape_subgraph_property.cc
##########
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+#include "./common.h"
+#include "./subgraph_property.h"
+#include "../../imperative/cached_op.h"
+
+namespace mxnet {
+namespace op {
+
+/*
+ * This selects nodes for a subgraph that only contains static shape operators
+ * and it visits nodes via both input and output links.
+ */
+class StaticShapeOpSelector: public SubgraphSelector {
+ public:
+  virtual bool Select(const nnvm::Node &seed_node) {
+    const auto& infershape = nnvm::Op::GetAttr<mxnet::FInferShape>("FInferShape");
+    return !seed_node.is_variable() && infershape.count(seed_node.op());
+  }
+
+  virtual bool SelectInput(const nnvm::Node &cur_node, const nnvm::Node &input_node) {
+    const auto& infershape = nnvm::Op::GetAttr<mxnet::FInferShape>("FInferShape");
+    return !input_node.is_variable() && infershape.count(input_node.op());
+  }
+
+  virtual bool SelectOutput(const nnvm::Node &cur_node, const nnvm::Node &output_node) {
+    const auto& infershape = nnvm::Op::GetAttr<mxnet::FInferShape>("FInferShape");
+    return !output_node.is_variable() && infershape.count(output_node.op());
+  }

Review comment:
       added the single node rejection mechanism in the property filter 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r510732221



##########
File path: include/mxnet/base.h
##########
@@ -532,6 +532,9 @@ inline std::ostream& operator<<(std::ostream &out, const Context &ctx) {
   return out;
 }
 
+// convert nnvm symbol to a nnvm graph.
+nnvm::Graph Symbol2Graph(const nnvm::Symbol &s);

Review comment:
       @waytrue17 is this needed now?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on a change in pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r517095153



##########
File path: python/mxnet/gluon/block.py
##########
@@ -1243,6 +1308,7 @@ def hybridize(self, active=True, backend=None, backend_opts=None, clear=True, **
             self._backend_opts = backend_opts
 
         self._active = active
+        self._partition_if_dynamic = partition_if_dynamic
         self._flags = list(kwargs.items())
         if clear:
             self._clear_cached_op()

Review comment:
       clear `_cached_op_args` is now included in `_clear_cached_op()`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r517658804



##########
File path: src/operator/subgraph/build_subgraph.cc
##########
@@ -329,7 +368,21 @@ void PreSelectSubgraphNodes(const nnvm::Graph& g, SubgraphSelectorV2Ptr subgraph
     }
     ++count;
   }
-  if (!success) {
+  if (success) {
+    // CachedOp requires external input during forward pass
+    // check subgraph input. If none, reject the first op (in top order) from the subgraph
+    // to make sure CachedOp gets external input.
+    // this feature can be switched off by setting ensure_CachedOp_input to false
+    const SubgraphPropertyPtr& subg_prop = g.GetAttr<SubgraphPropertyPtr>("subgraph_property");
+    if (subg_prop->HasAttr("ensure_CachedOp_input")
+        && subg_prop->GetAttr<bool>("ensure_CachedOp_input")) {
+      if (subgraph_nodes->size() > 0 && !HasInputEntries(g, simple_nodes, *subgraph_nodes)) {
+        // relabel the node to -1
+        (*subgraph_nodes)[0]->label = -1;
+        subgraph_nodes->erase(subgraph_nodes->begin());

Review comment:
       sounds good, can we rename the flag to something more general like`requireSubgraphInputs`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r464778682



##########
File path: python/mxnet/symbol/symbol.py
##########
@@ -2665,11 +2665,30 @@ def _optimize_for_dynamic_shape_op(self, is_np_array):
             - If true, output type is np symbol, otherwise nd symbol.
         """
         out = SymbolHandle()
-        check_call(_LIB.MXOptimizeForDynamicShapeOp(self.handle, ctypes.byref(out)))
+        has_dynamic_shape = ctypes.c_bool(False)
+
+        # Pass static_allco and static_shape flags into c_api
+        key_list = []
+        val_list = []
+        for flag in flags:
+            key, val = flag[0], flag[1]
+            if key == ['static_alloc', 'static_shape']:

Review comment:
       Yes, thanks for catching that




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#issuecomment-656879963


   Hey @mseth10 , Thanks for submitting the PR 
   All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: 
   - To trigger all jobs: @mxnet-bot run ci [all] 
   - To trigger specific jobs: @mxnet-bot run ci [job1, job2] 
   *** 
   **CI supported jobs**: [centos-cpu, windows-gpu, unix-cpu, sanity, miscellaneous, windows-cpu, unix-gpu, edge, clang, website, centos-gpu]
   *** 
   _Note_: 
    Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. 
   All CI tests must pass before the PR can be merged. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r512991561



##########
File path: python/mxnet/gluon/block.py
##########
@@ -1094,8 +1094,57 @@ def _deferred_infer_shape(self, *args):
             raise ValueError(error_msg)
 
     def _call_cached_op(self, *args):
-        if self._cached_op is None:
+        if not self._cached_op_args or not self._cached_graph:
             self._build_cache(*args)
+
+        if self._first_forward:
+            # partition static shape ops if the graph contains any dynamic shape op
+            self._first_forward = False
+            data, out = self._cached_graph
+            out, is_dynamic = out._optimize_for_dynamic_shape_op(is_np_array(), self._flags)
+            self._cached_graph = data, out

Review comment:
       We are still evaluating moving this logic to C++ CachedOp vs just reusing optimize_for here. In the latter case too, not updating self._cached_graph after dynamic shape partitioning should work.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r519112478



##########
File path: src/operator/subgraph/static_shape_subgraph_property.cc
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+#include "./common.h"
+#include "./subgraph_property.h"
+#include "../../imperative/cached_op.h"
+
+namespace mxnet {
+namespace op {
+
+/*
+ * This selects nodes for a subgraph that only contains static shape operators
+ * and it visits nodes via both input and output links.
+ */
+class StaticShapeOpSelector: public SubgraphSelector {
+ public:
+  // select nodes with FInferShape attribute
+  bool Select(const nnvm::Node &seed_node) override {
+    const auto& infershape = nnvm::Op::GetAttr<mxnet::FInferShape>("FInferShape");
+    return !seed_node.is_variable() && infershape.count(seed_node.op()) &&
+           !unsupported_op_names_.count(seed_node.op()->name);
+  }
+
+  bool SelectInput(const nnvm::Node &cur_node, const nnvm::Node &input_node) override {
+    return Select(input_node);
+  }
+
+  bool SelectOutput(const nnvm::Node &cur_node, const nnvm::Node &output_node) override {
+    return Select(output_node);
+  }
+
+  // reject single node subgraph
+  std::vector<nnvm::Node*> Filter(const std::vector<nnvm::Node*>& candidates) override {
+    if (candidates.size() == 1) {
+      return std::vector<nnvm::Node*>();
+    }
+    return candidates;
+  }
+
+ private:
+    // static shape ops that fail backward pass inside subgraph CachedOp
+    // GitHub issue: https://github.com/apache/incubator-mxnet/issues/18823
+    std::unordered_set<std::string> unsupported_op_names_ {"Reshape", "_np_reshape", "transpose",

Review comment:
       what did you guys decide to do about these fixed ops to exclude? Is it worth having an option for users to specify a list of ops to exclude, or is that something we should consider in the future based on the feedback you get after this PR?
   
   Since you added the top level partition_if_dynamic option, users can always turn it off if it doesnt work correctly. So that should cover the case where this change breaks existing use cases (if at all). 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r459798576



##########
File path: src/operator/subgraph/static_shape_subgraph_property.cc
##########
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+#include "./common.h"
+#include "./subgraph_property.h"
+#include "../../imperative/cached_op.h"
+
+namespace mxnet {
+namespace op {
+
+/*
+ * This selects nodes for a subgraph that only contains static shape operators
+ * and it visits nodes via both input and output links.
+ */
+class StaticShapeOpSelector: public SubgraphSelector {
+ public:
+  virtual bool Select(const nnvm::Node &seed_node) {
+    const auto& infershape = nnvm::Op::GetAttr<mxnet::FInferShape>("FInferShape");
+    return !seed_node.is_variable() && infershape.count(seed_node.op());
+  }
+
+  virtual bool SelectInput(const nnvm::Node &cur_node, const nnvm::Node &input_node) {
+    const auto& infershape = nnvm::Op::GetAttr<mxnet::FInferShape>("FInferShape");
+    return !input_node.is_variable() && infershape.count(input_node.op());
+  }
+
+  virtual bool SelectOutput(const nnvm::Node &cur_node, const nnvm::Node &output_node) {
+    const auto& infershape = nnvm::Op::GetAttr<mxnet::FInferShape>("FInferShape");
+    return !output_node.is_variable() && infershape.count(output_node.op());
+  }

Review comment:
       right, there is no advantage of having a single node subgraph, we can reject such subgraphs in the subgraph property's `CreateSubgraphNode` function.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r460555053



##########
File path: src/operator/subgraph/build_subgraph.cc
##########
@@ -360,6 +405,17 @@ void SelectSubgraphNodes(nnvm::Graph* g, SubgraphSelectorV2Ptr subgraph_selector
     // filter out unqualified pre-selected nodes
     std::vector<BiDirectedNode*> filtered_nodes = subgraph_selector->Filter(preselected_nodes);
 
+    const SubgraphPropertyPtr& subg_prop = g->GetAttr<SubgraphPropertyPtr>("subgraph_property");
+    if (subg_prop->HasAttr("ensure_CachedOp_input")
+        && subg_prop->GetAttr<bool>("ensure_CachedOp_input")) {
+      // check if subgraph has external input.
+      // if not, reject the first op (in top order) from the subgraph
+      // to make sure CachedOp gets external input.
+      if (filtered_nodes.size() > 0 && !HasInputEntries(*g, simple_nodes, filtered_nodes)) {
+        filtered_nodes.erase(filtered_nodes.begin());

Review comment:
       Yes, this is handled by the filter function in our subgraph property. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on a change in pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
szha commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r519246942



##########
File path: tests/python/unittest/test_dynamic_shape.py
##########
@@ -47,3 +47,26 @@ def hybrid_forward(self, F, data, index):
     assert_almost_equal(result.asnumpy(), result_nd)
     assert_almost_equal(data.grad.asnumpy(), data_grad_nd)
 
+def test_dynamic_shape_with_reshape():
+    # test dynamic shape op followed by reshape op
+    class _TestBlock(gluon.HybridBlock):
+
+        def __init__(self):
+            super(_TestBlock, self).__init__()
+
+        def hybrid_forward(self, F, data, index):
+            return F.contrib.boolean_mask(data, index).reshape((-1, ))

Review comment:
       let's also add test cases for deferred compute based implementation.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#issuecomment-659004405


   Unauthorized access detected. 
   Only following 3 categories can trigger CI : 
   PR Author, MXNet Committer, Jenkins Admin.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r460570318



##########
File path: src/operator/subgraph/build_subgraph.cc
##########
@@ -360,6 +405,17 @@ void SelectSubgraphNodes(nnvm::Graph* g, SubgraphSelectorV2Ptr subgraph_selector
     // filter out unqualified pre-selected nodes
     std::vector<BiDirectedNode*> filtered_nodes = subgraph_selector->Filter(preselected_nodes);
 
+    const SubgraphPropertyPtr& subg_prop = g->GetAttr<SubgraphPropertyPtr>("subgraph_property");
+    if (subg_prop->HasAttr("ensure_CachedOp_input")
+        && subg_prop->GetAttr<bool>("ensure_CachedOp_input")) {
+      // check if subgraph has external input.
+      // if not, reject the first op (in top order) from the subgraph
+      // to make sure CachedOp gets external input.
+      if (filtered_nodes.size() > 0 && !HasInputEntries(*g, simple_nodes, filtered_nodes)) {
+        filtered_nodes.erase(filtered_nodes.begin());

Review comment:
       Can you double check? We don’t run filter function after removing first node from subgraph. In the case when preselected nodes contain exactly two ops, filter function does not do anything, right? And then if one node gets removed, do we discard the resulting subgraph containing a single node? Can you show me how that works?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r517080275



##########
File path: python/mxnet/gluon/block.py
##########
@@ -1243,6 +1308,7 @@ def hybridize(self, active=True, backend=None, backend_opts=None, clear=True, **
             self._backend_opts = backend_opts
 
         self._active = active
+        self._partition_if_dynamic = partition_if_dynamic
         self._flags = list(kwargs.items())
         if clear:
             self._clear_cached_op()

Review comment:
       dont we need to clear `_cached_op_args` here too?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on a change in pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r516988225



##########
File path: src/operator/subgraph/build_subgraph.cc
##########
@@ -329,7 +368,21 @@ void PreSelectSubgraphNodes(const nnvm::Graph& g, SubgraphSelectorV2Ptr subgraph
     }
     ++count;
   }
-  if (!success) {
+  if (success) {
+    // CachedOp requires external input during forward pass
+    // check subgraph input. If none, reject the first op (in top order) from the subgraph
+    // to make sure CachedOp gets external input.
+    // this feature can be switched off by setting ensure_CachedOp_input to false
+    const SubgraphPropertyPtr& subg_prop = g.GetAttr<SubgraphPropertyPtr>("subgraph_property");
+    if (subg_prop->HasAttr("ensure_CachedOp_input")
+        && subg_prop->GetAttr<bool>("ensure_CachedOp_input")) {
+      if (subgraph_nodes->size() > 0 && !HasInputEntries(g, simple_nodes, *subgraph_nodes)) {
+        // relabel the node to -1
+        (*subgraph_nodes)[0]->label = -1;
+        subgraph_nodes->erase(subgraph_nodes->begin());

Review comment:
       here we wanted to check if cachedOp has external input, which requires both the entire graph and the subgraph. 
   Seems it’s not trivial to move this functionality to `filter` as it is not easy to pass the info to there. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r454730389



##########
File path: include/mxnet/c_api.h
##########
@@ -3262,6 +3262,17 @@ MXNET_DLL int MXEnginePushSyncND(EngineSyncFunc sync_func, void* func_param,
                                  EngineFnPropertyHandle prop_handle DEFAULT(NULL),
                                  int priority DEFAULT(0), const char* opr_name DEFAULT(NULL));
 
+/*!
+ * \brief This function first check if any dynamic shape op presents in the symbol.
+ * \brief If yes, partition all static ops for execution optimization.
+ * \param sym_handle handler of the symbole.
+ * \param has_dynamic_shape status handler where true means dynamic shape op prestns and false otherwise.
+ * \param ret_sym_handle handler of result symbol
+ */
+MXNET_DLL int MXPartitionForStaticShapeOps(SymbolHandle sym_handle, 
+                                           bool* has_dynamic_shape, 

Review comment:
       remove this




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
leezu commented on pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#issuecomment-660317004


   This is broken. You can try the following slight extension of the `test_dynamic_shape.py`:
   
   ``` diff
   modified   tests/python/unittest/test_dynamic_shape.py
   @@ -33,7 +33,7 @@ def test_dynamic_shape():
                super(_TestBlock, self).__init__()
    
            def hybrid_forward(self, F, data, index):
   -            return F.contrib.boolean_mask(data, index)
   +            return F.contrib.boolean_mask(data, index).reshape((-1, ))
    
        block = _TestBlock()
        block.hybridize()
   ```
   
   For your convenience, I add a commit to this PR to add the extra `reshape` statement.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r456657714



##########
File path: src/operator/subgraph/static_shape_subgraph_property.cc
##########
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+#include "./common.h"
+#include "./subgraph_property.h"
+#include "../../imperative/cached_op.h"
+
+namespace mxnet {
+namespace op {
+
+/*
+ * This selects nodes for a subgraph that only contains static shape operators
+ * and it visits nodes via both input and output links.
+ */
+class StaticShapeOpSelector: public SubgraphSelector {
+ public:
+  virtual bool Select(const nnvm::Node &seed_node) {
+    const auto& infershape = nnvm::Op::GetAttr<mxnet::FInferShape>("FInferShape");
+    return !seed_node.is_variable() && infershape.count(seed_node.op());
+  }
+
+  virtual bool SelectInput(const nnvm::Node &cur_node, const nnvm::Node &input_node) {
+    const auto& infershape = nnvm::Op::GetAttr<mxnet::FInferShape>("FInferShape");
+    return !input_node.is_variable() && infershape.count(input_node.op());
+  }
+
+  virtual bool SelectOutput(const nnvm::Node &cur_node, const nnvm::Node &output_node) {
+    const auto& infershape = nnvm::Op::GetAttr<mxnet::FInferShape>("FInferShape");
+    return !output_node.is_variable() && infershape.count(output_node.op());
+  }
+};
+
+/*
+ * This subgraph property finds a subgraph whose nodes have only static shape operators.
+ * The operators in the subgraph will be executed by _CachedOp.
+ */
+class StaticShapeSubgraphProperty: public SubgraphProperty {
+ public:
+  static SubgraphPropertyPtr Create() { return std::make_shared<StaticShapeSubgraphProperty>(); }
+
+  // the criteria of selecting the subgraph nodes
+  virtual SubgraphSelectorPtr CreateSubgraphSelector() const {
+    return std::make_shared<StaticShapeOpSelector>();
+  }
+
+  // create an nnvm node for a given subgraph
+  virtual nnvm::ObjectPtr CreateSubgraphNode(const nnvm::Symbol &sym,
+                                             const int subgraph_id = 0) const {
+    nnvm::ObjectPtr n = nnvm::Node::Create();
+    n->attrs.op = Op::Get("_CachedOp");

Review comment:
       Makes sense, will add it




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
mseth10 commented on pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#issuecomment-717507308


   > I left a few comments, can you guys clarify what your current todo list or is this PR done in your mind?
   
   Thanks for the review @samskalicky . Our current todo list includes evaluating 1) moving graph partitioning to C++ cachedop vs 2) using `optimize_for` in `_call_cached_op`, and making those changes to this PR. Apart from this refactor, we do not plan to add any functionality to this PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r456570278



##########
File path: python/mxnet/symbol/symbol.py
##########
@@ -2627,6 +2628,15 @@ def detach(self):
     def backward(self):
         raise NotImplementedForSymbol(self.backward, None)
 
+    def optimize_for_dynamic_shape_op(self):
+        """Check if any dynamic shape op presents in the symbol, if yes, partition all static shape ops for optimization
+        returns the optimized symbol.
+        """
+        out = SymbolHandle()
+        check_call(_LIB.MXOptimizeForDynamicShapeOp(self.handle, ctypes.byref(out)))
+        from .numpy import _Symbol as np_symbol

Review comment:
       I tried to put it on the top. It causes import loop error

##########
File path: python/mxnet/symbol/symbol.py
##########
@@ -1470,6 +1470,10 @@ def optimize_for(self, backend, args=None, aux=None, ctx=None,
         ctx : Context, optional
             Device context, used to infer stypes
 
+        is_np_sym : boolean, optional
+            Output symbol type
+            - If true, output type is np symbol, otherwise nd symbol.
+

Review comment:
       Will be changing it

##########
File path: python/mxnet/symbol/symbol.py
##########
@@ -2627,6 +2633,24 @@ def detach(self):
     def backward(self):
         raise NotImplementedForSymbol(self.backward, None)
 
+    def optimize_for_dynamic_shape_op(self, is_np_sym=False):

Review comment:
       Will change to private function




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on a change in pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r520287358



##########
File path: tests/python/unittest/test_dynamic_shape.py
##########
@@ -47,3 +47,26 @@ def hybrid_forward(self, F, data, index):
     assert_almost_equal(result.asnumpy(), result_nd)
     assert_almost_equal(data.grad.asnumpy(), data_grad_nd)
 
+def test_dynamic_shape_with_reshape():
+    # test dynamic shape op followed by reshape op
+    class _TestBlock(gluon.HybridBlock):
+
+        def __init__(self):
+            super(_TestBlock, self).__init__()
+
+        def hybrid_forward(self, F, data, index):
+            return F.contrib.boolean_mask(data, index).reshape((-1, ))

Review comment:
       Added a unit test for deferred compute mode




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] sxjscience commented on pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
sxjscience commented on pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#issuecomment-670991882


   @mseth10 
   Here is an example of layer dropout as in paper https://arxiv.org/pdf/1909.11556.pdf. You may use this for profiling.
   
   ```python
   import mxnet as mx
   from mxnet.gluon import nn
   mx.npx.set_np()
   
   def np_cond(F, pred, then_func, else_func):
       out = F.contrib.cond(pred.as_nd_ndarray(), then_func.as_nd_ndarray(), else_func.as_nd_ndarray())
       return out.as_np_ndarray()
   
   
   
   class LayerDropMLP(nn.HybridBlock):
       def __init__(self, use_layer_drop,
                    layer_drop_ratio=0.1,
                    units=32,
                    num_layers=10):
           super().__init__()
           self._num_layers = num_layers
           self._use_layer_drop = use_layer_drop
           self._layer_drop_ratio = layer_drop_ratio
           self.layers = nn.HybridSequential()
           for i in range(num_layers):
               layer = nn.HybridSequential()
               layer.add(nn.Dense(units, in_units=units))
               layer.add(nn.Activation('tanh'))
               self.layers.add(layer)
   
       def hybrid_forward(self, F, x):
           out = x
           for i in range(self._num_layers):
               choose_new = F.np.random.uniform(0, 1) > self._layer_drop_ratio
               if F == mx.ndarray:
                   if choose_new.asnumpy():
                       out = self.layers[i](out)
               else:
                   forward_out = self.layers[i](out)
                   out = np_cond(F, choose_new.astype('float32'), forward_out, out)
           return out
   
   units = 32
   foo = Foo(use_layer_drop=True, units=units, layer_drop_ratio=0.2)
   foo.initialize()
   foo.hybridize()
   out = foo(mx.np.random.normal(0, 1, (32, units), dtype=mx.np.float32))
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r470777553



##########
File path: src/operator/subgraph/static_shape_subgraph_property.cc
##########
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+#include "./common.h"
+#include "./subgraph_property.h"
+#include "../../imperative/cached_op.h"
+
+namespace mxnet {
+namespace op {
+
+/*
+ * This selects nodes for a subgraph that only contains static shape operators
+ * and it visits nodes via both input and output links.
+ */
+class StaticShapeOpSelector: public SubgraphSelector {
+ public:
+  virtual bool Select(const nnvm::Node &seed_node) {
+    const auto& infershape = nnvm::Op::GetAttr<mxnet::FInferShape>("FInferShape");
+    return !seed_node.is_variable() && infershape.count(seed_node.op());
+  }
+
+  virtual bool SelectInput(const nnvm::Node &cur_node, const nnvm::Node &input_node) {
+    const auto& infershape = nnvm::Op::GetAttr<mxnet::FInferShape>("FInferShape");
+    return !input_node.is_variable() && infershape.count(input_node.op());
+  }
+
+  virtual bool SelectOutput(const nnvm::Node &cur_node, const nnvm::Node &output_node) {
+    const auto& infershape = nnvm::Op::GetAttr<mxnet::FInferShape>("FInferShape");
+    return !output_node.is_variable() && infershape.count(output_node.op());
+  }
+};
+
+/*
+ * This subgraph property finds a subgraph whose nodes have only static shape operators.
+ * The operators in the subgraph will be executed by _CachedOp.
+ */
+class StaticShapeSubgraphProperty: public SubgraphProperty {
+ public:
+  static SubgraphPropertyPtr Create() { return std::make_shared<StaticShapeSubgraphProperty>(); }
+
+  // the criteria of selecting the subgraph nodes
+  virtual SubgraphSelectorPtr CreateSubgraphSelector() const {
+    return std::make_shared<StaticShapeOpSelector>();
+  }
+
+  // create an nnvm node for a given subgraph
+  virtual nnvm::ObjectPtr CreateSubgraphNode(const nnvm::Symbol &sym,
+                                             const int subgraph_id = 0) const {
+    nnvm::ObjectPtr n = nnvm::Node::Create();
+    n->attrs.op = Op::Get("_CachedOp");

Review comment:
       added




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r519123441



##########
File path: src/operator/subgraph/static_shape_subgraph_property.cc
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+#include "./common.h"
+#include "./subgraph_property.h"
+#include "../../imperative/cached_op.h"
+
+namespace mxnet {
+namespace op {
+
+/*
+ * This selects nodes for a subgraph that only contains static shape operators
+ * and it visits nodes via both input and output links.
+ */
+class StaticShapeOpSelector: public SubgraphSelector {
+ public:
+  // select nodes with FInferShape attribute
+  bool Select(const nnvm::Node &seed_node) override {
+    const auto& infershape = nnvm::Op::GetAttr<mxnet::FInferShape>("FInferShape");
+    return !seed_node.is_variable() && infershape.count(seed_node.op()) &&
+           !unsupported_op_names_.count(seed_node.op()->name);
+  }
+
+  bool SelectInput(const nnvm::Node &cur_node, const nnvm::Node &input_node) override {
+    return Select(input_node);
+  }
+
+  bool SelectOutput(const nnvm::Node &cur_node, const nnvm::Node &output_node) override {
+    return Select(output_node);
+  }
+
+  // reject single node subgraph
+  std::vector<nnvm::Node*> Filter(const std::vector<nnvm::Node*>& candidates) override {
+    if (candidates.size() == 1) {
+      return std::vector<nnvm::Node*>();
+    }
+    return candidates;
+  }
+
+ private:
+    // static shape ops that fail backward pass inside subgraph CachedOp
+    // GitHub issue: https://github.com/apache/incubator-mxnet/issues/18823
+    std::unordered_set<std::string> unsupported_op_names_ {"Reshape", "_np_reshape", "transpose",

Review comment:
       You put it correctly, we can decide on providing that option based on feedback from users. Also, we have filed an issue #18823 for the same which we'll address in a separate PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
leezu commented on pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#issuecomment-660321410


   Backtrace of the issue below. I use the following patch to obtain the backtrace
   
   ``` diff
   modified   src/c_api/c_api_ndarray.cc
   @@ -394,7 +394,6 @@ int MXAutogradBackwardEx(uint32_t num_output,
                             NDArrayHandle **grad_handles,
                             int **grad_stypes) {
      MXAPIThreadLocalEntry<> *ret = MXAPIThreadLocalStore<>::Get();
   -  API_BEGIN();
    
      std::vector<NDArray*> outputs, ograds, variables;
      outputs.reserve(num_output);
   @@ -430,7 +429,7 @@ int MXAutogradBackwardEx(uint32_t num_output,
        *grad_handles = dmlc::BeginPtr(ret->ret_handles);
        *grad_stypes = dmlc::BeginPtr(ret->out_types);
      }
   -  API_END();
   +  return 0;
    }
    
    int MXAutogradGetSymbol(NDArrayHandle handle, SymbolHandle *out) {
   ```
   
   and build via `cmake -GNinja -DCMAKE_BUILD_TYPE=Debug -DLOG_FATAL_THROW=0 -DUSE_CUDA=0 ..; ninja`
   
   
   Backtrace:
   
   ```
   Thread 1 "python3.8" received signal SIGABRT, Aborted.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
   51      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 (gdb) bt
   #0  0x00007ffff705ef47 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
   #1  0x00007ffff70608b1 in __GI_abort () at abort.c:79
   #2  0x00007fff3737c257 in  () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
   #3  0x00007fff37387606 in  () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
   #4  0x00007fff37387671 in  () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
   #5  0x00007fff37387905 in  () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
   #6  0x00007fff3737e96b in std::__throw_bad_cast() () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
   #7  0x00007fff3ba064f7 in __gnu_cxx::new_allocator<nnvm::NodeEntry>::allocate(unsigned long, void const*) (this=0x555555e73ea0, __n=18446744073709459446) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/ext/new_allocator.h:106
   #8  0x00007fff3ba064b4 in std::allocator_traits<std::allocator<nnvm::NodeEntry> >::allocate(std::allocator<nnvm::NodeEntry>&, unsigned long) (__a=..., __n=18446744073709459446) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/alloc_traits.h:460
   #9  0x00007fff3ba06423 in std::_Vector_base<nnvm::NodeEntry, std::allocator<nnvm::NodeEntry> >::_M_allocate(unsigned long) (this=0x555555e73ea0, __n=18446744073709459446) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_vector.h:346
   #10 0x00007fff3ba060cc in std::vector<nnvm::NodeEntry, std::allocator<nnvm::NodeEntry> >::_M_allocate_and_copy<__gnu_cxx::__normal_iterator<nnvm::NodeEntry const*, std::vector<nnvm::NodeEntry, std::allocator<nnvm::NodeEntry> > > >(unsigned long, __gnu_cxx::__normal_iterator<nnvm::NodeEntry const*, std::vector<nnvm::NodeEntry, std::allocator<nnvm::NodeEntry> > >, __gnu_cxx::__normal_iterator<nnvm::NodeEntry const*, std::vector<nnvm::NodeEntry, std::allocator<nnvm::NodeEntry> > >) (this=0x555555e73ea0, __n=18446744073709459446, __first={node = <error reading variable: Cannot access memory at address 0xf0000000a>, index = 15, version = 3707764736}, __last={node = <error reading variable: Cannot access memory at address 0x100000010>, index = 1470775728, version = 21845}) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_vector.h:1511
   #11 0x00007fff3ba04c07 in std::vector<nnvm::NodeEntry, std::allocator<nnvm::NodeEntry> >::operator=(std::vector<nnvm::NodeEntry, std::allocator<nnvm::NodeEntry> > const&) (this=0x555555e73ea0, __x=std::vector of length -92170, capacity 12802 = {...}) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/vector.tcc:226
   #12 0x00007fff3bc74a44 in nnvm::Graph::operator=(nnvm::Graph const&) (this=0x555555e73ea0) at ../include/nnvm/graph.h:46
   #13 0x00007fff3bc60133 in mxnet::CachedOp::DynamicBackward(bool, mxnet::OpStatePtr const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&) (this=0x555557aa41b0, retain_graph=false, op_state=..., inputs=std::vector of length 3, capacity 3 = {...}, reqs=std::vector of length 1, capacity 1 = {...}, outputs=std::vector of length 1, capacity 1 = {...}) at ../src/imperative/cached_op.cc:853
   #14 0x00007fff3bc6262b in mxnet::CachedOp::Backward(bool, mxnet::OpStatePtr const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&) (this=0x555557aa41b0, retain_graph=false, state=..., inputs=std::vector of length 3, capacity 3 = {...}, reqs=std::vector of length 1, capacity 1 = {...}, outputs=std::vector of length 1, capacity 1 = {...}) at ../src/imperative/cached_op.cc:1048
   #15 0x00007fff3bcdd29d in (anonymous namespace)::InvokeOperator(nnvm::IndexedGraph const&, int, bool, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, mxnet::Context, std::vector<mxnet::OpStatePtr, std::allocator<mxnet::OpStatePtr> >*, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> >*, std::vector<unsigned int, std::allocator<unsigned int> >*, std::function<void (mxnet::OpStatePtr const&)>) (idx=..., node_idx=5, retain_graph=false, arrays=std::vector of length 8, capacity 8 = {...}, ctx=..., p_states=0x7ffffffccfa8, ndinputs=std::vector of length 3, capacity 3 = {...}, ndoutputs=std::vector of length 1, capacity 1 = {...}, p_req=0x7ffffffcc328, p_ref_count=0x7ffffffccfc8, invoke=...) at ../src/imperative/imperative_utils.cc:91
   #16 0x00007fff3bcdcaca in mxnet::imperative::RunGraph(bool, nnvm::IndexedGraph const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, unsigned long, unsigned long, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> >&&, std::vector<unsigned int, std::allocator<unsigned int> >&&, std::vector<mxnet::OpStatePtr, std::allocator<mxnet::OpStatePtr> >*, std::vector<mxnet::DispatchMode, std::allocator<mxnet::DispatchMode> > const&, bool, std::vector<mxnet::TShape, std::allocator<mxnet::TShape> >*, std::function<void (char const*, char const*, void*)> const&, bool) (retain_graph=false, idx=..., arrays=std::vector of length 8, capacity 8 = {...}, node_start=4, node_end=7, array_reqs=..., ref_count=..., p_states=0x7ffffffccfa8, dispatch_modes=std::vector of length 7, capacity 7 = {...}, recording=false, shapes=0x0, callback=..., monitor_all=false)
       at ../src/imperative/imperative_utils.cc:165
   #17 0x00007fff3bcbe53c in mxnet::Imperative::Backward(std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, bool, bool, bool) (this=0x7fff501e8e78 <mxnet::Imperative::Get()::inst>, outputs=std::vector of length 1, capacity 1 = {...}, ograds=std::vector of length 1, capacity 1 = {...}, variables=std::vector of length 0, capacity 0, is_train=true, retain_graph=false, create_graph=false) at ../src/imperative/imperative.cc:616
   #18 0x00007fff3bab5a3f in MXAutogradBackwardEx(uint32_t, NDArrayHandle*, NDArrayHandle*, uint32_t, NDArrayHandle*, int, int, int, NDArrayHandle**, int**) (num_output=1, output_handles=0x7ffda003bbe0, ograd_handles=0x7ffda003bd20, num_variables=0, var_handles=0x0, retain_graph=0, create_graph=0, is_train=1, grad_handles=0x0, grad_stypes=0x0) at ../src/c_api/c_api_ndarray.cc:418
   #19 0x00007ffff2f65dae in ffi_call_unix64 () at /usr/lib/x86_64-linux-gnu/libffi.so.6
   #20 0x00007ffff2f6571f in ffi_call () at /usr/lib/x86_64-linux-gnu/libffi.so.6
   #21 0x00007ffff317d415 in _call_function_pointer (flags=4353, pProc=0x7fff3bab57c0 <MXAutogradBackwardEx(uint32_t, NDArrayHandle*, NDArrayHandle*, uint32_t, NDArrayHandle*, int, int, int, NDArrayHandle**, int**)>, avalues=0x7ffffffcd9c0, atypes=0x7ffffffcd960, restype=0x7ffff35bb9f8, resmem=0x7ffffffcda20, argcount=10) at /tmp/python-build.20200514035455.63369/Python-3.8.2/Modules/_ctypes/callproc.c:871
   #22 0x00007ffff317de19 in _ctypes_callproc (pProc=0x7fff3bab57c0 <MXAutogradBackwardEx(uint32_t, NDArrayHandle*, NDArrayHandle*, uint32_t, NDArrayHandle*, int, int, int, NDArrayHandle**, int**)>, argtuple=(1, <c_void_p_Array_1 at remote 0x7ffda003bb90>, <c_void_p_Array_1 at remote 0x7ffda003bcd0>, 0, <c_void_p at remote 0x7ffda003be10>, <c_int at remote 0x7ffda003beb0>, <c_int at remote 0x7ffda003bf50>, <c_int at remote 0x7ffdc19de050>, <c_void_p at remote 0x7ffdc19de0f0>, <c_void_p at remote 0x7ffdc19de190>), flags=4353, argtypes=0x0, restype=<_ctypes.PyCSimpleType at remote 0x555555e7bce0>, checker=0x0) at /tmp/python-build.20200514035455.63369/Python-3.8.2/Modules/_ctypes/callproc.c:1199
   #23 0x00007ffff3178169 in PyCFuncPtr_call (self=0x7ffda004cc90, inargs=(1, <c_void_p_Array_1 at remote 0x7ffda003bb90>, <c_void_p_Array_1 at remote 0x7ffda003bcd0>, 0, <c_void_p at remote 0x7ffda003be10>, <c_int at remote 0x7ffda003beb0>, <c_int at remote 0x7ffda003bf50>, <c_int at remote 0x7ffdc19de050>, <c_void_p at remote 0x7ffdc19de0f0>, <c_void_p at remote 0x7ffdc19de190>), kwds=0x0) at /tmp/python-build.20200514035455.63369/Python-3.8.2/Modules/_ctypes/_ctypes.c:4201
   ...
   ```
   
   Specifically note the `Cannot access memory at address 0xf0000000a`
   
   To obtain the backtrace run `gdb /path/to/python` and `(gdb) run -m pytest --color=yes --verbose --exitfirst ./tests/python/unittest/test_dynamic_shape.py` and then `bt`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r460503949



##########
File path: src/operator/subgraph/build_subgraph.cc
##########
@@ -360,6 +405,17 @@ void SelectSubgraphNodes(nnvm::Graph* g, SubgraphSelectorV2Ptr subgraph_selector
     // filter out unqualified pre-selected nodes
     std::vector<BiDirectedNode*> filtered_nodes = subgraph_selector->Filter(preselected_nodes);
 
+    const SubgraphPropertyPtr& subg_prop = g->GetAttr<SubgraphPropertyPtr>("subgraph_property");
+    if (subg_prop->HasAttr("ensure_CachedOp_input")
+        && subg_prop->GetAttr<bool>("ensure_CachedOp_input")) {
+      // check if subgraph has external input.
+      // if not, reject the first op (in top order) from the subgraph
+      // to make sure CachedOp gets external input.
+      if (filtered_nodes.size() > 0 && !HasInputEntries(*g, simple_nodes, filtered_nodes)) {
+        filtered_nodes.erase(filtered_nodes.begin());

Review comment:
       @waytrue17 are we handling the case when this leads to `filtered_nodes` containing only one node? we should ideally reject such a subgraph.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
mseth10 commented on pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#issuecomment-665576875


   @leezu thanks for providing us with an example to reproduce the memory error. After investigation, it turns out that there is some issue with the reshape operator itself, such that backward pass fails on subgraphs containing reshape op. It is not related to the changes made in this PR. I have created a GitHub issue for the same #18823 . Similar failure is observed for transpose operator.
   
   Currently, since we are not allowing subgraphs containing a single op, your change `return F.contrib.boolean_mask(data, index).reshape((-1, ))` does not produce any subgraphs anymore, and does not cause the same error. Hence, we'll revert your change as the test now fails due to incorrect checks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu edited a comment on pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
leezu edited a comment on pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#issuecomment-660317004


   This isn't working yet. You can try the following slight extension of the `test_dynamic_shape.py`:
   
   ``` diff
   modified   tests/python/unittest/test_dynamic_shape.py
   @@ -33,7 +33,7 @@ def test_dynamic_shape():
                super(_TestBlock, self).__init__()
    
            def hybrid_forward(self, F, data, index):
   -            return F.contrib.boolean_mask(data, index)
   +            return F.contrib.boolean_mask(data, index).reshape((-1, ))
    
        block = _TestBlock()
        block.hybridize()
   ```
   
   For your convenience, I add a commit to this PR to add the extra `reshape` statement.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r454728340



##########
File path: python/mxnet/symbol/symbol.py
##########
@@ -2513,6 +2513,18 @@ def detach(self):
     def backward(self):
         raise NotImplementedForSymbol(self.backward, None)
 
+    def partition_for_static_shape_ops(self):
+        """Check if any dynamic shape ops present in the symbol, if yes, partition static shape ops for optimization
+        returns the optimized symbol.
+        """
+        out = SymbolHandle()
+        has_dynamic_shape_ops = ctypes.c_bool(False)

Review comment:
       remove this variable




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r519123441



##########
File path: src/operator/subgraph/static_shape_subgraph_property.cc
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+#include "./common.h"
+#include "./subgraph_property.h"
+#include "../../imperative/cached_op.h"
+
+namespace mxnet {
+namespace op {
+
+/*
+ * This selects nodes for a subgraph that only contains static shape operators
+ * and it visits nodes via both input and output links.
+ */
+class StaticShapeOpSelector: public SubgraphSelector {
+ public:
+  // select nodes with FInferShape attribute
+  bool Select(const nnvm::Node &seed_node) override {
+    const auto& infershape = nnvm::Op::GetAttr<mxnet::FInferShape>("FInferShape");
+    return !seed_node.is_variable() && infershape.count(seed_node.op()) &&
+           !unsupported_op_names_.count(seed_node.op()->name);
+  }
+
+  bool SelectInput(const nnvm::Node &cur_node, const nnvm::Node &input_node) override {
+    return Select(input_node);
+  }
+
+  bool SelectOutput(const nnvm::Node &cur_node, const nnvm::Node &output_node) override {
+    return Select(output_node);
+  }
+
+  // reject single node subgraph
+  std::vector<nnvm::Node*> Filter(const std::vector<nnvm::Node*>& candidates) override {
+    if (candidates.size() == 1) {
+      return std::vector<nnvm::Node*>();
+    }
+    return candidates;
+  }
+
+ private:
+    // static shape ops that fail backward pass inside subgraph CachedOp
+    // GitHub issue: https://github.com/apache/incubator-mxnet/issues/18823
+    std::unordered_set<std::string> unsupported_op_names_ {"Reshape", "_np_reshape", "transpose",

Review comment:
       You put it correctly, we can decide on providing that option based on feedback from users.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
szha commented on pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#issuecomment-664045955


   @samskalicky it would be great to have your review on this


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on a change in pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r513039088



##########
File path: include/mxnet/base.h
##########
@@ -532,6 +532,9 @@ inline std::ostream& operator<<(std::ostream &out, const Context &ctx) {
   return out;
 }
 
+// convert nnvm symbol to a nnvm graph.
+nnvm::Graph Symbol2Graph(const nnvm::Symbol &s);

Review comment:
       Yes, will remove it within the next commit




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r464799273



##########
File path: src/operator/subgraph/build_subgraph.cc
##########
@@ -88,10 +88,12 @@ void CreateSimpleGraph(const nnvm::Graph& g,
         it->second.push_back(i);
       }
     }
-    // store control flow ops in control_flow_nodes
-    if (!node->is_variable() && (node->op()->name.compare("_while_loop") == 0
-        || node->op()->name.compare("_cond") == 0)) {
-      control_flow_nodes->emplace_back(node);
+    // store nodes to be partitioned recursively
+    const auto& infershape = nnvm::Op::GetAttr<mxnet::FInferShape>("FInferShape");
+    if (!node->is_variable() && !infershape.count(node->op()) && 
+      node->attrs.subgraphs.size() > 0) {
+      std::cout << node->op()->name << std::endl;

Review comment:
       remove this




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r460585378



##########
File path: src/operator/subgraph/build_subgraph.cc
##########
@@ -360,6 +405,17 @@ void SelectSubgraphNodes(nnvm::Graph* g, SubgraphSelectorV2Ptr subgraph_selector
     // filter out unqualified pre-selected nodes
     std::vector<BiDirectedNode*> filtered_nodes = subgraph_selector->Filter(preselected_nodes);
 
+    const SubgraphPropertyPtr& subg_prop = g->GetAttr<SubgraphPropertyPtr>("subgraph_property");
+    if (subg_prop->HasAttr("ensure_CachedOp_input")
+        && subg_prop->GetAttr<bool>("ensure_CachedOp_input")) {
+      // check if subgraph has external input.
+      // if not, reject the first op (in top order) from the subgraph
+      // to make sure CachedOp gets external input.
+      if (filtered_nodes.size() > 0 && !HasInputEntries(*g, simple_nodes, filtered_nodes)) {
+        filtered_nodes.erase(filtered_nodes.begin());

Review comment:
       Thanks for bringing this up! I agreed that the filter function should've been called after removing first node, not before. I'll include the change in the next commit




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on a change in pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r518441321



##########
File path: python/mxnet/gluon/block.py
##########
@@ -1489,7 +1510,7 @@ def reset_ctx(self, ctx):
             copy will be made for each context.
         """
         params = self.collect_params()
-        if self._cached_op:
+        if self._cached_op_args:

Review comment:
       Reverted this change




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r518417379



##########
File path: python/mxnet/gluon/block.py
##########
@@ -1489,7 +1510,7 @@ def reset_ctx(self, ctx):
             copy will be made for each context.
         """
         params = self.collect_params()
-        if self._cached_op:
+        if self._cached_op_args:

Review comment:
       why do we need this change?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
mseth10 commented on pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#issuecomment-723401245


   > Why are we updating mkldnn here?
   
   Some issue with my setup, `git submodule update` checks out a different commit of mkldnn. I did a hard reset to the master version now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r512868503



##########
File path: python/mxnet/gluon/block.py
##########
@@ -1094,8 +1094,57 @@ def _deferred_infer_shape(self, *args):
             raise ValueError(error_msg)
 
     def _call_cached_op(self, *args):
-        if self._cached_op is None:
+        if not self._cached_op_args or not self._cached_graph:
             self._build_cache(*args)
+
+        if self._first_forward:
+            # partition static shape ops if the graph contains any dynamic shape op
+            self._first_forward = False
+            data, out = self._cached_graph
+            out, is_dynamic = out._optimize_for_dynamic_shape_op(is_np_array(), self._flags)
+            self._cached_graph = data, out

Review comment:
       We cant override the `self._cached_graph` with the dynamic shape partitioned graph since users can call `optimize_for` at any time after the model is hybridized. We dont want to have users to try and optimize the "dynamic shape graph partitioned" version. 
   
   Are you planning on pushing this down into the creation of the C++ CachedOp? or some other approach to resolve this issue?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r519110874



##########
File path: python/mxnet/symbol/symbol.py
##########
@@ -2650,6 +2650,15 @@ def detach(self):
     def backward(self):
         raise NotImplementedForSymbol(self.backward, None)
 
+
+    def _check_dynamic_shape_op(self):

Review comment:
       nit: if you need to make any other changes, consider changing this to "has_dynamic_shape_op" and edit the docstring to say the return value is boolean.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] sxjscience edited a comment on pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
sxjscience edited a comment on pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#issuecomment-670991882


   @mseth10 
   Here is an example of layer dropout as in paper https://arxiv.org/pdf/1909.11556.pdf. You may use this for profiling.
   
   ```python
   import mxnet as mx
   from mxnet.gluon import nn
   mx.npx.set_np()
   
   def np_cond(F, pred, then_func, else_func):
       out = F.contrib.cond(pred.as_nd_ndarray(), then_func.as_nd_ndarray(), else_func.as_nd_ndarray())
       return out.as_np_ndarray()
   
   
   
   class LayerDropMLP(nn.HybridBlock):
       def __init__(self, use_layer_drop,
                    layer_drop_ratio=0.1,
                    units=32,
                    num_layers=10):
           super().__init__()
           self._num_layers = num_layers
           self._use_layer_drop = use_layer_drop
           self._layer_drop_ratio = layer_drop_ratio
           self.layers = nn.HybridSequential()
           for i in range(num_layers):
               layer = nn.HybridSequential()
               layer.add(nn.Dense(units, in_units=units))
               layer.add(nn.Activation('tanh'))
               self.layers.add(layer)
   
       def hybrid_forward(self, F, x):
           out = x
           for i in range(self._num_layers):
               choose_new = F.np.random.uniform(0, 1) > self._layer_drop_ratio
               if F == mx.ndarray:
                   if choose_new.asnumpy():
                       out = self.layers[i](out)
               else:
                   forward_out = self.layers[i](out)
                   out = np_cond(F, choose_new.astype('float32'), forward_out, out)
           return out
   
   units = 32
   foo = LayerDropMLP(use_layer_drop=True, units=units, layer_drop_ratio=0.2)
   foo.initialize()
   foo.hybridize()
   out = foo(mx.np.random.normal(0, 1, (32, units), dtype=mx.np.float32))
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#issuecomment-659004391


   @mxnet-bot run ci [all]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#issuecomment-665993052


   > Thanks @mseth10 for investigating the root cause. Could you add a new tests that contains the `reshape` after dynamic shape?
   Test case added. Thanks for your suggestions and help on this issue
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r456648950



##########
File path: src/operator/subgraph/static_shape_subgraph_property.cc
##########
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+#include "./common.h"
+#include "./subgraph_property.h"
+#include "../../imperative/cached_op.h"
+
+namespace mxnet {
+namespace op {
+
+/*
+ * This selects nodes for a subgraph that only contains static shape operators
+ * and it visits nodes via both input and output links.
+ */
+class StaticShapeOpSelector: public SubgraphSelector {
+ public:
+  virtual bool Select(const nnvm::Node &seed_node) {
+    const auto& infershape = nnvm::Op::GetAttr<mxnet::FInferShape>("FInferShape");
+    return !seed_node.is_variable() && infershape.count(seed_node.op());
+  }
+
+  virtual bool SelectInput(const nnvm::Node &cur_node, const nnvm::Node &input_node) {
+    const auto& infershape = nnvm::Op::GetAttr<mxnet::FInferShape>("FInferShape");
+    return !input_node.is_variable() && infershape.count(input_node.op());
+  }
+
+  virtual bool SelectOutput(const nnvm::Node &cur_node, const nnvm::Node &output_node) {
+    const auto& infershape = nnvm::Op::GetAttr<mxnet::FInferShape>("FInferShape");
+    return !output_node.is_variable() && infershape.count(output_node.op());
+  }
+};
+
+/*
+ * This subgraph property finds a subgraph whose nodes have only static shape operators.
+ * The operators in the subgraph will be executed by _CachedOp.
+ */
+class StaticShapeSubgraphProperty: public SubgraphProperty {
+ public:
+  static SubgraphPropertyPtr Create() { return std::make_shared<StaticShapeSubgraphProperty>(); }
+
+  // the criteria of selecting the subgraph nodes
+  virtual SubgraphSelectorPtr CreateSubgraphSelector() const {
+    return std::make_shared<StaticShapeOpSelector>();
+  }
+
+  // create an nnvm node for a given subgraph
+  virtual nnvm::ObjectPtr CreateSubgraphNode(const nnvm::Symbol &sym,
+                                             const int subgraph_id = 0) const {
+    nnvm::ObjectPtr n = nnvm::Node::Create();
+    n->attrs.op = Op::Get("_CachedOp");

Review comment:
       Should propagate the `static_shape`, `static_alloc` settings from outer CachedOp?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
samskalicky commented on pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#issuecomment-717419249


   I left a few comments, can you guys clarify what your current todo list or is this PR done in your mind?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] chinakook commented on pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
chinakook commented on pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#issuecomment-663594081


   I think it's a great need and we must develop it further because the competitor py*ch has very perfect jit system to handle with dynamic ops and flow control that make the model deployment very easy and convenient.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r486080444



##########
File path: src/operator/subgraph/static_shape_subgraph_property.cc
##########
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+#include "./common.h"
+#include "./subgraph_property.h"
+#include "../../imperative/cached_op.h"
+
+namespace mxnet {
+namespace op {
+
+/*
+ * This selects nodes for a subgraph that only contains static shape operators
+ * and it visits nodes via both input and output links.
+ */
+class StaticShapeOpSelector: public SubgraphSelector {
+ public:
+  // select nodes with FInferShape attribute
+  bool Select(const nnvm::Node &seed_node) override {
+    const auto& infershape = nnvm::Op::GetAttr<mxnet::FInferShape>("FInferShape");
+    return !seed_node.is_variable() && infershape.count(seed_node.op()) &&
+           !unsupported_op_names_.count(seed_node.op()->name);
+  }
+
+  bool SelectInput(const nnvm::Node &cur_node, const nnvm::Node &input_node) override {
+    return Select(input_node);
+  }
+
+  bool SelectOutput(const nnvm::Node &cur_node, const nnvm::Node &output_node) override {
+    return Select(output_node);
+  }
+
+  // reject single node subgraph
+  std::vector<nnvm::Node*> Filter(const std::vector<nnvm::Node*>& candidates) override {
+    if (candidates.size() == 1) {
+      return std::vector<nnvm::Node*>();
+    }
+    return candidates;
+  }
+
+ private:
+    // static shape ops that fail backward pass inside subgraph CachedOp
+    // GitHub issue: https://github.com/apache/incubator-mxnet/issues/18823
+    std::unordered_set<std::string> unsupported_op_names_ {"Reshape", "_np_reshape", "transpose",
+                                                           "_npi_dstack", "_npi_hstack"};
+};
+
+/*
+ * This subgraph property finds a subgraph whose nodes have only static shape operators.
+ * The operators in the subgraph will be executed by _CachedOp.
+ */
+class StaticShapeSubgraphProperty: public SubgraphProperty {
+ public:
+  StaticShapeSubgraphProperty() {
+    // flag to recursively partition dynamic shape op nodes containing subgraphs
+    attrs_["recursive_partition"] = std::make_shared<dmlc::any>(true);
+    // flag to ensure subgraph CachedOp has at least one external input
+    // as required by CachedOp::Forward
+    attrs_["ensure_CachedOp_input"] = std::make_shared<dmlc::any>(true);
+  }
+  static SubgraphPropertyPtr Create() { return std::make_shared<StaticShapeSubgraphProperty>(); }
+
+  SubgraphSelectorPtr CreateSubgraphSelector() const override {
+    return std::make_shared<StaticShapeOpSelector>();
+  }
+
+  void PrePartition(const nnvm::Graph& g,
+                    const std::unordered_map<std::string, std::string>& options_map) override {
+    // update static_alloc and static_shape flags
+    if (options_map_.size() == 0) {
+      for (auto& kv : options_map) {
+        options_map_.emplace_back(kv);
+      }
+    }
+    if (param_name_set_.size() > 0) {
+      param_name_set_.clear();
+    }
+    // generate param_name_set_ for data_indices and param_indices
+    if (g.HasAttr("param_indices_list")) {
+      const std::vector<int>& param_indices_list
+          = g.GetAttr<std::vector<int>>("param_indices_list");
+      const auto& indexed_graph = g.indexed_graph();
+      for (const auto index : param_indices_list) {
+          auto nid = indexed_graph.input_nodes()[index];
+          nnvm::Node n = *(indexed_graph[nid].source);
+          param_name_set_.emplace(n.attrs.name);
+      }
+    }
+  }
+
+  void InitSubgraphInputs(std::vector<nnvm::NodeEntry*>* input_entries,
+                          std::vector<nnvm::NodeEntry>* orig_input_entries) override {

Review comment:
       add `const` ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha merged pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
szha merged pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r486095732



##########
File path: src/operator/subgraph/static_shape_subgraph_property.cc
##########
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+#include "./common.h"
+#include "./subgraph_property.h"
+#include "../../imperative/cached_op.h"
+
+namespace mxnet {
+namespace op {
+
+/*
+ * This selects nodes for a subgraph that only contains static shape operators
+ * and it visits nodes via both input and output links.
+ */
+class StaticShapeOpSelector: public SubgraphSelector {
+ public:
+  // select nodes with FInferShape attribute
+  bool Select(const nnvm::Node &seed_node) override {
+    const auto& infershape = nnvm::Op::GetAttr<mxnet::FInferShape>("FInferShape");
+    return !seed_node.is_variable() && infershape.count(seed_node.op()) &&
+           !unsupported_op_names_.count(seed_node.op()->name);
+  }
+
+  bool SelectInput(const nnvm::Node &cur_node, const nnvm::Node &input_node) override {
+    return Select(input_node);
+  }
+
+  bool SelectOutput(const nnvm::Node &cur_node, const nnvm::Node &output_node) override {
+    return Select(output_node);
+  }
+
+  // reject single node subgraph
+  std::vector<nnvm::Node*> Filter(const std::vector<nnvm::Node*>& candidates) override {
+    if (candidates.size() == 1) {
+      return std::vector<nnvm::Node*>();
+    }
+    return candidates;
+  }
+
+ private:
+    // static shape ops that fail backward pass inside subgraph CachedOp
+    // GitHub issue: https://github.com/apache/incubator-mxnet/issues/18823
+    std::unordered_set<std::string> unsupported_op_names_ {"Reshape", "_np_reshape", "transpose",
+                                                           "_npi_dstack", "_npi_hstack"};
+};
+
+/*
+ * This subgraph property finds a subgraph whose nodes have only static shape operators.
+ * The operators in the subgraph will be executed by _CachedOp.
+ */
+class StaticShapeSubgraphProperty: public SubgraphProperty {
+ public:
+  StaticShapeSubgraphProperty() {
+    // flag to recursively partition dynamic shape op nodes containing subgraphs
+    attrs_["recursive_partition"] = std::make_shared<dmlc::any>(true);
+    // flag to ensure subgraph CachedOp has at least one external input
+    // as required by CachedOp::Forward
+    attrs_["ensure_CachedOp_input"] = std::make_shared<dmlc::any>(true);
+  }
+  static SubgraphPropertyPtr Create() { return std::make_shared<StaticShapeSubgraphProperty>(); }
+
+  SubgraphSelectorPtr CreateSubgraphSelector() const override {
+    return std::make_shared<StaticShapeOpSelector>();
+  }
+
+  void PrePartition(const nnvm::Graph& g,
+                    const std::unordered_map<std::string, std::string>& options_map) override {
+    // update static_alloc and static_shape flags
+    if (options_map_.size() == 0) {
+      for (auto& kv : options_map) {
+        options_map_.emplace_back(kv);
+      }
+    }
+    if (param_name_set_.size() > 0) {
+      param_name_set_.clear();
+    }
+    // generate param_name_set_ for data_indices and param_indices
+    if (g.HasAttr("param_indices_list")) {
+      const std::vector<int>& param_indices_list
+          = g.GetAttr<std::vector<int>>("param_indices_list");
+      const auto& indexed_graph = g.indexed_graph();
+      for (const auto index : param_indices_list) {
+          auto nid = indexed_graph.input_nodes()[index];
+          nnvm::Node n = *(indexed_graph[nid].source);
+          param_name_set_.emplace(n.attrs.name);
+      }
+    }
+  }
+
+  void InitSubgraphInputs(std::vector<nnvm::NodeEntry*>* input_entries,
+                          std::vector<nnvm::NodeEntry>* orig_input_entries) override {

Review comment:
       Added, thanks for pointing this out




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r460610246



##########
File path: src/operator/subgraph/build_subgraph.cc
##########
@@ -360,6 +405,17 @@ void SelectSubgraphNodes(nnvm::Graph* g, SubgraphSelectorV2Ptr subgraph_selector
     // filter out unqualified pre-selected nodes
     std::vector<BiDirectedNode*> filtered_nodes = subgraph_selector->Filter(preselected_nodes);
 
+    const SubgraphPropertyPtr& subg_prop = g->GetAttr<SubgraphPropertyPtr>("subgraph_property");
+    if (subg_prop->HasAttr("ensure_CachedOp_input")
+        && subg_prop->GetAttr<bool>("ensure_CachedOp_input")) {
+      // check if subgraph has external input.
+      // if not, reject the first op (in top order) from the subgraph
+      // to make sure CachedOp gets external input.
+      if (filtered_nodes.size() > 0 && !HasInputEntries(*g, simple_nodes, filtered_nodes)) {
+        filtered_nodes.erase(filtered_nodes.begin());

Review comment:
       Moved `ensure_CachedOp_input` inside `PreSelectSubgraphNodes`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r519110874



##########
File path: python/mxnet/symbol/symbol.py
##########
@@ -2650,6 +2650,15 @@ def detach(self):
     def backward(self):
         raise NotImplementedForSymbol(self.backward, None)
 
+
+    def _check_dynamic_shape_op(self):

Review comment:
       nit: if you need to make any other changes, consider changing this to "has_dynamic_shape_op"




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
leezu commented on pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#issuecomment-665791614


   Thanks @mseth10 for investigating the root cause. Could you add a new tests that contains the `reshape` after dynamic shape?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] waytrue17 commented on a change in pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
waytrue17 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r511155116



##########
File path: include/mxnet/base.h
##########
@@ -532,6 +532,9 @@ inline std::ostream& operator<<(std::ostream &out, const Context &ctx) {
   return out;
 }
 
+// convert nnvm symbol to a nnvm graph.
+nnvm::Graph Symbol2Graph(const nnvm::Symbol &s);

Review comment:
       right.. I'll remove it in the next commit




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #18690: Optimize graph execution in presence of dynamic shape op

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r516993737



##########
File path: src/operator/subgraph/build_subgraph.cc
##########
@@ -329,7 +368,21 @@ void PreSelectSubgraphNodes(const nnvm::Graph& g, SubgraphSelectorV2Ptr subgraph
     }
     ++count;
   }
-  if (!success) {
+  if (success) {
+    // CachedOp requires external input during forward pass
+    // check subgraph input. If none, reject the first op (in top order) from the subgraph
+    // to make sure CachedOp gets external input.
+    // this feature can be switched off by setting ensure_CachedOp_input to false
+    const SubgraphPropertyPtr& subg_prop = g.GetAttr<SubgraphPropertyPtr>("subgraph_property");
+    if (subg_prop->HasAttr("ensure_CachedOp_input")
+        && subg_prop->GetAttr<bool>("ensure_CachedOp_input")) {
+      if (subgraph_nodes->size() > 0 && !HasInputEntries(g, simple_nodes, *subgraph_nodes)) {
+        // relabel the node to -1
+        (*subgraph_nodes)[0]->label = -1;
+        subgraph_nodes->erase(subgraph_nodes->begin());

Review comment:
       If we keep this logic here, then it can also be used by other subgraph properties by setting `ensure_CachedOp_input` flag.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r455544340



##########
File path: python/mxnet/symbol/symbol.py
##########
@@ -2627,6 +2628,15 @@ def detach(self):
     def backward(self):
         raise NotImplementedForSymbol(self.backward, None)
 
+    def optimize_for_dynamic_shape_op(self):
+        """Check if any dynamic shape op presents in the symbol, if yes, partition all static shape ops for optimization
+        returns the optimized symbol.
+        """
+        out = SymbolHandle()
+        check_call(_LIB.MXOptimizeForDynamicShapeOp(self.handle, ctypes.byref(out)))
+        from .numpy import _Symbol as np_symbol

Review comment:
       you can import it once on top of file




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on a change in pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#discussion_r456645837



##########
File path: src/operator/subgraph/static_shape_subgraph_property.cc
##########
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+#include "./common.h"
+#include "./subgraph_property.h"
+#include "../../imperative/cached_op.h"
+
+namespace mxnet {
+namespace op {
+
+/*
+ * This selects nodes for a subgraph that only contains static shape operators
+ * and it visits nodes via both input and output links.
+ */
+class StaticShapeOpSelector: public SubgraphSelector {
+ public:
+  virtual bool Select(const nnvm::Node &seed_node) {
+    const auto& infershape = nnvm::Op::GetAttr<mxnet::FInferShape>("FInferShape");
+    return !seed_node.is_variable() && infershape.count(seed_node.op());
+  }
+
+  virtual bool SelectInput(const nnvm::Node &cur_node, const nnvm::Node &input_node) {
+    const auto& infershape = nnvm::Op::GetAttr<mxnet::FInferShape>("FInferShape");
+    return !input_node.is_variable() && infershape.count(input_node.op());
+  }
+
+  virtual bool SelectOutput(const nnvm::Node &cur_node, const nnvm::Node &output_node) {
+    const auto& infershape = nnvm::Op::GetAttr<mxnet::FInferShape>("FInferShape");
+    return !output_node.is_variable() && infershape.count(output_node.op());
+  }

Review comment:
       What is the advantage of creating a separate subgraph / cachedop for (the special case of) single nodes "surrounded" by dynamic shape operators?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on pull request #18690: [WIP] optimize graph in presence of dynamic shape ops

Posted by GitBox <gi...@apache.org>.
mseth10 commented on pull request #18690:
URL: https://github.com/apache/incubator-mxnet/pull/18690#issuecomment-656891609


   @mxnet-label-bot add [Operator, Backend, pr-work-in-progress]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org