You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/04/13 16:29:41 UTC

[GitHub] [tvm] AndrewZhaoLuo opened a new pull request, #11000: [Graph Debugger] Expose way to benchmark individual nodes.

AndrewZhaoLuo opened a new pull request, #11000:
URL: https://github.com/apache/tvm/pull/11000

   This is useful if we only care about a single node or set of nodes in our graph we want to benchmark.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tkonolige commented on a diff in pull request #11000: [Graph Debugger] Expose way to benchmark individual nodes.

Posted by GitBox <gi...@apache.org>.

tkonolige commented on code in PR #11000:
URL: https://github.com/apache/tvm/pull/11000#discussion_r858084865


##########
tests/python/unittest/test_runtime_graph_debug.py:
##########
@@ -185,5 +191,47 @@ def check_remote(server):
     check_remote(rpc.Server("127.0.0.1"))
 
 
+@tvm.testing.requires_llvm
+def test_run_single_node(graph):
+    mlib_proxy = tvm.support.FrontendTestModule()
+    mlib_proxy["myadd"] = myadd
+    try:
+        mod: debug_executor.GraphModuleDebug = debug_executor.create(graph, mlib_proxy, tvm.cpu(0))
+    except ValueError:
+        return

Review Comment:
   Seems like we should throw an exception if there is an error.



##########
tests/python/unittest/test_runtime_graph_debug.py:
##########
@@ -185,5 +191,47 @@ def check_remote(server):
     check_remote(rpc.Server("127.0.0.1"))
 
 
+@tvm.testing.requires_llvm
+def test_run_single_node(graph):
+    mlib_proxy = tvm.support.FrontendTestModule()
+    mlib_proxy["myadd"] = myadd
+    try:
+        mod: debug_executor.GraphModuleDebug = debug_executor.create(graph, mlib_proxy, tvm.cpu(0))
+    except ValueError:
+        return
+
+    a = np.random.uniform(size=(n,)).astype(A.dtype)
+    mod.set_input(x=a)
+
+    assert len(mod.debug_datum.get_graph_nodes()) == 2
+    assert mod.debug_datum.get_graph_nodes()[0]["op"] == "param"
+    assert mod.debug_datum.get_graph_nodes()[1]["op"] == "myadd"
+
+    # Running a node with no associated function should return instantly and have 0 runtime
+    mod.run_individual_node(0, number=1).mean == 0
+
+    # Meanwhile the actual function should take some time, more time if you run it more times
+    repeat_1_result = mod.run_individual_node(1, repeat=1)
+    repeat_1_result.mean > 0

Review Comment:
   This should be an assert



##########
tests/python/unittest/test_runtime_graph_debug.py:
##########
@@ -19,26 +19,34 @@
 import re
 import sys
 import time
+from distutils.log import debug
 
+import numpy as np
 import pytest
-
 import tvm
 import tvm.testing
-from tvm import te
-import numpy as np
-from tvm import rpc
+from tvm import rpc, te
+from tvm._ffi.base import TVMError
 from tvm.contrib import utils
 from tvm.contrib.debugger import debug_executor
 
+# Constants for creating simple graphs
+n = 4
+A = te.placeholder((n,), name="A")
+B = te.compute(A.shape, lambda *i: A(*i) + 1.0, name="B")
+s = te.create_schedule(B.op)

Review Comment:
   Put these inside the functions using them or use fixtures. Freestanding globals like this may cause issues with multiprocessing.



##########
tests/python/unittest/test_runtime_graph_debug.py:
##########
@@ -185,5 +191,47 @@ def check_remote(server):
     check_remote(rpc.Server("127.0.0.1"))
 
 
+@tvm.testing.requires_llvm
+def test_run_single_node(graph):
+    mlib_proxy = tvm.support.FrontendTestModule()
+    mlib_proxy["myadd"] = myadd
+    try:
+        mod: debug_executor.GraphModuleDebug = debug_executor.create(graph, mlib_proxy, tvm.cpu(0))
+    except ValueError:
+        return
+
+    a = np.random.uniform(size=(n,)).astype(A.dtype)
+    mod.set_input(x=a)
+
+    assert len(mod.debug_datum.get_graph_nodes()) == 2
+    assert mod.debug_datum.get_graph_nodes()[0]["op"] == "param"
+    assert mod.debug_datum.get_graph_nodes()[1]["op"] == "myadd"
+
+    # Running a node with no associated function should return instantly and have 0 runtime
+    mod.run_individual_node(0, number=1).mean == 0

Review Comment:
   This should be an assert



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] AndrewZhaoLuo commented on a diff in pull request #11000: [Graph Debugger] Expose way to benchmark individual nodes.

Posted by GitBox <gi...@apache.org>.

AndrewZhaoLuo commented on code in PR #11000:
URL: https://github.com/apache/tvm/pull/11000#discussion_r856618606


##########
python/tvm/contrib/debugger/debug_executor.py:
##########
@@ -281,6 +282,42 @@ def run_individual(self, number, repeat=1, min_repeat_ms=0):
         ret = self._run_individual(number, repeat, min_repeat_ms)
         return ret.strip(",").split(",") if ret else []
 
+    def run_individual_node(self, index, number, repeat=1, min_repeat_ms=0):
+        """Benchmark a single node in the serialized graph.
+
+        Parameters
+        ----------
+        index : int
+            The index of the node, see `self.debug_datum.get_graph_nodes`
+
+        number: int
+            The number of times to run the node to get a benchmark result.
+
+        repeat: int
+            The number of times to benchmark the nodes.

Review Comment:
   Done



##########
python/tvm/contrib/debugger/debug_executor.py:
##########
@@ -281,6 +282,42 @@ def run_individual(self, number, repeat=1, min_repeat_ms=0):
         ret = self._run_individual(number, repeat, min_repeat_ms)
         return ret.strip(",").split(",") if ret else []
 
+    def run_individual_node(self, index, number, repeat=1, min_repeat_ms=0):
+        """Benchmark a single node in the serialized graph.
+
+        Parameters
+        ----------
+        index : int
+            The index of the node, see `self.debug_datum.get_graph_nodes`
+
+        number: int
+            The number of times to run the node to get a benchmark result.
+
+        repeat: int
+            The number of times to benchmark the nodes.
+
+        min_repeat_ms: int
+            The minimum consecutive runtime of the node for a benchmark result.
+
+        Returns
+        -------
+        A list of dimensions `number` x `repeat` each one the runtime of the node

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] AndrewZhaoLuo commented on a diff in pull request #11000: [Graph Debugger] Expose way to benchmark individual nodes.

Posted by GitBox <gi...@apache.org>.

AndrewZhaoLuo commented on code in PR #11000:
URL: https://github.com/apache/tvm/pull/11000#discussion_r858230661


##########
tests/python/unittest/test_runtime_graph_debug.py:
##########
@@ -185,5 +191,47 @@ def check_remote(server):
     check_remote(rpc.Server("127.0.0.1"))
 
 
+@tvm.testing.requires_llvm
+def test_run_single_node(graph):
+    mlib_proxy = tvm.support.FrontendTestModule()
+    mlib_proxy["myadd"] = myadd
+    try:
+        mod: debug_executor.GraphModuleDebug = debug_executor.create(graph, mlib_proxy, tvm.cpu(0))
+    except ValueError:
+        return

Review Comment:
   Cool, done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tkonolige commented on a diff in pull request #11000: [Graph Debugger] Expose way to benchmark individual nodes.

Posted by GitBox <gi...@apache.org>.

tkonolige commented on code in PR #11000:
URL: https://github.com/apache/tvm/pull/11000#discussion_r852464986


##########
python/tvm/contrib/debugger/debug_executor.py:
##########
@@ -281,6 +282,42 @@ def run_individual(self, number, repeat=1, min_repeat_ms=0):
         ret = self._run_individual(number, repeat, min_repeat_ms)
         return ret.strip(",").split(",") if ret else []
 
+    def run_individual_node(self, index, number, repeat=1, min_repeat_ms=0):
+        """Benchmark a single node in the serialized graph.
+
+        Parameters
+        ----------
+        index : int
+            The index of the node, see `self.debug_datum.get_graph_nodes`
+
+        number: int
+            The number of times to run the node to get a benchmark result.
+
+        repeat: int
+            The number of times to benchmark the nodes.
+
+        min_repeat_ms: int
+            The minimum consecutive runtime of the node for a benchmark result.
+
+        Returns
+        -------
+        A list of dimensions `number` x `repeat` each one the runtime of the node

Review Comment:
   `BenchmarkResult` is consistent with other interfaces, so using that would be best.



##########
src/runtime/graph_executor/debug/graph_executor_debug.cc:
##########
@@ -114,15 +94,69 @@ class GraphExecutorDebug : public GraphExecutor {
 
     std::ostringstream os;
     for (size_t index = 0; index < time_sec_per_op.size(); index++) {
-      os << time_sec_per_op[index] << ",";
+      double time = time_sec_per_op[index];
+      // To have good behavior when calculating total time, etc.
+      if (isnan(time)) {
+        time = 0;
+      }
+      os << time << ",";
     }
     return os.str();
   }
 
+  std::vector<std::vector<double>> RunIndividualNode(int node_index, int number, int repeat,
+                                                     int min_repeat_ms) {
+    // warmup run
+    // GraphExecutor::Run();
+    std::string tkey = module_->type_key();
+
+    // results_in_seconds[a][b] is the bth index run of the ath index repeat
+    std::vector<std::vector<double>> results_in_seconds;
+
+    if (tkey == "rpc") {
+      LOG(FATAL) << "RPC measurements should not use RunIndividualNode!";
+    }
+
+    for (int i = 0; i < repeat; ++i) {
+      std::vector<Timer> op_timers;
+      double duration_ms = 0.0;
+
+      // Keep timing operations, upping number of repeats until we reach min_repeat_ms
+      do {
+        op_timers.clear();
+        if (duration_ms > 0.0) {
+          number = static_cast<int>(std::max((min_repeat_ms / (duration_ms / number) + 1),
+                                             number * 1.618));  // 1.618 is chosen by random
+        }
+
+        std::chrono::time_point<std::chrono::high_resolution_clock, std::chrono::nanoseconds>
+            tbegin, tend;
+        tbegin = std::chrono::high_resolution_clock::now();

Review Comment:
   Here is the main loop from time evaulator: https://github.com/apache/tvm/blob/main/src/runtime/rpc/rpc_module.cc#L370-L403. Here is the timer interface: https://github.com/apache/tvm/blob/main/include/tvm/runtime/profiling.h#L43-L142.
   
   Using time_evaluator is probably a better choice than reimplementing the timing loop.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] AndrewZhaoLuo commented on a diff in pull request #11000: [Graph Debugger] Expose way to benchmark individual nodes.

Posted by GitBox <gi...@apache.org>.

AndrewZhaoLuo commented on code in PR #11000:
URL: https://github.com/apache/tvm/pull/11000#discussion_r858100679


##########
tests/python/unittest/test_runtime_graph_debug.py:
##########
@@ -19,26 +19,34 @@
 import re
 import sys
 import time
+from distutils.log import debug
 
+import numpy as np
 import pytest
-
 import tvm
 import tvm.testing
-from tvm import te
-import numpy as np
-from tvm import rpc
+from tvm import rpc, te
+from tvm._ffi.base import TVMError
 from tvm.contrib import utils
 from tvm.contrib.debugger import debug_executor
 
+# Constants for creating simple graphs
+n = 4
+A = te.placeholder((n,), name="A")
+B = te.compute(A.shape, lambda *i: A(*i) + 1.0, name="B")
+s = te.create_schedule(B.op)

Review Comment:
   Done



##########
tests/python/unittest/test_runtime_graph_debug.py:
##########
@@ -185,5 +191,47 @@ def check_remote(server):
     check_remote(rpc.Server("127.0.0.1"))
 
 
+@tvm.testing.requires_llvm
+def test_run_single_node(graph):
+    mlib_proxy = tvm.support.FrontendTestModule()
+    mlib_proxy["myadd"] = myadd
+    try:
+        mod: debug_executor.GraphModuleDebug = debug_executor.create(graph, mlib_proxy, tvm.cpu(0))
+    except ValueError:
+        return
+
+    a = np.random.uniform(size=(n,)).astype(A.dtype)
+    mod.set_input(x=a)
+
+    assert len(mod.debug_datum.get_graph_nodes()) == 2
+    assert mod.debug_datum.get_graph_nodes()[0]["op"] == "param"
+    assert mod.debug_datum.get_graph_nodes()[1]["op"] == "myadd"
+
+    # Running a node with no associated function should return instantly and have 0 runtime
+    mod.run_individual_node(0, number=1).mean == 0

Review Comment:
   Oopsy, done



##########
tests/python/unittest/test_runtime_graph_debug.py:
##########
@@ -185,5 +191,47 @@ def check_remote(server):
     check_remote(rpc.Server("127.0.0.1"))
 
 
+@tvm.testing.requires_llvm
+def test_run_single_node(graph):
+    mlib_proxy = tvm.support.FrontendTestModule()
+    mlib_proxy["myadd"] = myadd
+    try:
+        mod: debug_executor.GraphModuleDebug = debug_executor.create(graph, mlib_proxy, tvm.cpu(0))
+    except ValueError:
+        return
+
+    a = np.random.uniform(size=(n,)).astype(A.dtype)
+    mod.set_input(x=a)
+
+    assert len(mod.debug_datum.get_graph_nodes()) == 2
+    assert mod.debug_datum.get_graph_nodes()[0]["op"] == "param"
+    assert mod.debug_datum.get_graph_nodes()[1]["op"] == "myadd"
+
+    # Running a node with no associated function should return instantly and have 0 runtime
+    mod.run_individual_node(0, number=1).mean == 0
+
+    # Meanwhile the actual function should take some time, more time if you run it more times
+    repeat_1_result = mod.run_individual_node(1, repeat=1)
+    repeat_1_result.mean > 0

Review Comment:
   Oopsy, done



##########
tests/python/unittest/test_runtime_graph_debug.py:
##########
@@ -185,5 +191,47 @@ def check_remote(server):
     check_remote(rpc.Server("127.0.0.1"))
 
 
+@tvm.testing.requires_llvm
+def test_run_single_node(graph):
+    mlib_proxy = tvm.support.FrontendTestModule()
+    mlib_proxy["myadd"] = myadd
+    try:
+        mod: debug_executor.GraphModuleDebug = debug_executor.create(graph, mlib_proxy, tvm.cpu(0))
+    except ValueError:
+        return

Review Comment:
   This will be hit if TVM is not built with the profiler on. Added a comment clarifying



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] AndrewZhaoLuo commented on a diff in pull request #11000: [Graph Debugger] Expose way to benchmark individual nodes.

Posted by GitBox <gi...@apache.org>.

AndrewZhaoLuo commented on code in PR #11000:
URL: https://github.com/apache/tvm/pull/11000#discussion_r852460978


##########
python/tvm/contrib/debugger/debug_executor.py:
##########
@@ -281,6 +282,42 @@ def run_individual(self, number, repeat=1, min_repeat_ms=0):
         ret = self._run_individual(number, repeat, min_repeat_ms)
         return ret.strip(",").split(",") if ret else []
 
+    def run_individual_node(self, index, number, repeat=1, min_repeat_ms=0):
+        """Benchmark a single node in the serialized graph.

Review Comment:
   Done



##########
python/tvm/contrib/debugger/debug_executor.py:
##########
@@ -281,6 +282,42 @@ def run_individual(self, number, repeat=1, min_repeat_ms=0):
         ret = self._run_individual(number, repeat, min_repeat_ms)
         return ret.strip(",").split(",") if ret else []
 
+    def run_individual_node(self, index, number, repeat=1, min_repeat_ms=0):
+        """Benchmark a single node in the serialized graph.
+
+        Parameters
+        ----------
+        index : int
+            The index of the node, see `self.debug_datum.get_graph_nodes`
+
+        number: int
+            The number of times to run the node to get a benchmark result.
+
+        repeat: int
+            The number of times to benchmark the nodes.
+
+        min_repeat_ms: int
+            The minimum consecutive runtime of the node for a benchmark result.
+
+        Returns
+        -------
+        A list of dimensions `number` x `repeat` each one the runtime of the node

Review Comment:
   Basically if have like 3 repeats of 3 numbers it would return a 3x3 array/list.
   
   arr[0][1] would be the first repeat in the second number, arr[1][2] would be the repeat 2, number 3, etc.
   
   I think BenchmarkResult is better though since it seems to store the sequence of all float results anyway?



##########
python/tvm/contrib/debugger/debug_executor.py:
##########
@@ -281,6 +282,42 @@ def run_individual(self, number, repeat=1, min_repeat_ms=0):
         ret = self._run_individual(number, repeat, min_repeat_ms)
         return ret.strip(",").split(",") if ret else []
 
+    def run_individual_node(self, index, number, repeat=1, min_repeat_ms=0):

Review Comment:
   done



##########
python/tvm/contrib/debugger/debug_executor.py:
##########
@@ -281,6 +282,42 @@ def run_individual(self, number, repeat=1, min_repeat_ms=0):
         ret = self._run_individual(number, repeat, min_repeat_ms)
         return ret.strip(",").split(",") if ret else []
 
+    def run_individual_node(self, index, number, repeat=1, min_repeat_ms=0):
+        """Benchmark a single node in the serialized graph.
+
+        Parameters
+        ----------
+        index : int
+            The index of the node, see `self.debug_datum.get_graph_nodes`
+
+        number: int
+            The number of times to run the node to get a benchmark result.
+
+        repeat: int
+            The number of times to benchmark the nodes.

Review Comment:
   I'll probably just use time_evaluator, so will change later.



##########
src/runtime/graph_executor/debug/graph_executor_debug.cc:
##########
@@ -362,6 +396,33 @@ PackedFunc GraphExecutorDebug::GetFunction(const std::string& name,
       ICHECK_GE(min_repeat_ms, 0);
       *rv = this->RunIndividual(number, repeat, min_repeat_ms);
     });
+  } else if (name == "run_individual_node") {
+    return PackedFunc([sptr_to_self, this](TVMArgs args, TVMRetValue* rv) {
+      int node_index = args[0];
+      int number = args[1];
+      int repeat = args[2];
+      int min_repeat_ms = args[3];
+      ICHECK_GE(node_index, 0);
+      ICHECK_LT(node_index, nodes_.size());
+      ICHECK_GT(number, 0);
+      ICHECK_GT(repeat, 0);
+      ICHECK_GE(min_repeat_ms, 0);
+      std::vector<std::vector<double>> results =
+          this->RunIndividualNode(node_index, number, repeat, min_repeat_ms);
+
+      std::stringstream s;
+      s.precision(6);  // down to microseconds

Review Comment:
   Done



##########
src/runtime/graph_executor/debug/graph_executor_debug.cc:
##########
@@ -114,15 +94,69 @@ class GraphExecutorDebug : public GraphExecutor {
 
     std::ostringstream os;
     for (size_t index = 0; index < time_sec_per_op.size(); index++) {
-      os << time_sec_per_op[index] << ",";
+      double time = time_sec_per_op[index];
+      // To have good behavior when calculating total time, etc.
+      if (isnan(time)) {
+        time = 0;
+      }

Review Comment:
   0 / 0 is possible from the above for nodes which do not have any associated execution function



##########
src/runtime/graph_executor/debug/graph_executor_debug.cc:
##########
@@ -114,15 +94,69 @@ class GraphExecutorDebug : public GraphExecutor {
 
     std::ostringstream os;
     for (size_t index = 0; index < time_sec_per_op.size(); index++) {
-      os << time_sec_per_op[index] << ",";
+      double time = time_sec_per_op[index];
+      // To have good behavior when calculating total time, etc.
+      if (isnan(time)) {
+        time = 0;
+      }
+      os << time << ",";
     }
     return os.str();
   }
 
+  std::vector<std::vector<double>> RunIndividualNode(int node_index, int number, int repeat,
+                                                     int min_repeat_ms) {
+    // warmup run
+    // GraphExecutor::Run();
+    std::string tkey = module_->type_key();
+
+    // results_in_seconds[a][b] is the bth index run of the ath index repeat
+    std::vector<std::vector<double>> results_in_seconds;
+
+    if (tkey == "rpc") {
+      LOG(FATAL) << "RPC measurements should not use RunIndividualNode!";
+    }
+
+    for (int i = 0; i < repeat; ++i) {
+      std::vector<Timer> op_timers;
+      double duration_ms = 0.0;
+
+      // Keep timing operations, upping number of repeats until we reach min_repeat_ms
+      do {
+        op_timers.clear();
+        if (duration_ms > 0.0) {
+          number = static_cast<int>(std::max((min_repeat_ms / (duration_ms / number) + 1),
+                                             number * 1.618));  // 1.618 is chosen by random
+        }
+
+        std::chrono::time_point<std::chrono::high_resolution_clock, std::chrono::nanoseconds>
+            tbegin, tend;
+        tbegin = std::chrono::high_resolution_clock::now();

Review Comment:
   Oh you mean get rid of all of this timing code here and instead use the python time_evaluator interface?



##########
src/runtime/graph_executor/debug/graph_executor_debug.cc:
##########
@@ -114,15 +94,69 @@ class GraphExecutorDebug : public GraphExecutor {
 
     std::ostringstream os;
     for (size_t index = 0; index < time_sec_per_op.size(); index++) {
-      os << time_sec_per_op[index] << ",";
+      double time = time_sec_per_op[index];
+      // To have good behavior when calculating total time, etc.
+      if (isnan(time)) {
+        time = 0;
+      }
+      os << time << ",";
     }
     return os.str();
   }
 
+  std::vector<std::vector<double>> RunIndividualNode(int node_index, int number, int repeat,
+                                                     int min_repeat_ms) {
+    // warmup run
+    // GraphExecutor::Run();
+    std::string tkey = module_->type_key();
+
+    // results_in_seconds[a][b] is the bth index run of the ath index repeat
+    std::vector<std::vector<double>> results_in_seconds;
+
+    if (tkey == "rpc") {
+      LOG(FATAL) << "RPC measurements should not use RunIndividualNode!";
+    }
+
+    for (int i = 0; i < repeat; ++i) {
+      std::vector<Timer> op_timers;
+      double duration_ms = 0.0;
+
+      // Keep timing operations, upping number of repeats until we reach min_repeat_ms
+      do {
+        op_timers.clear();
+        if (duration_ms > 0.0) {
+          number = static_cast<int>(std::max((min_repeat_ms / (duration_ms / number) + 1),
+                                             number * 1.618));  // 1.618 is chosen by random
+        }
+
+        std::chrono::time_point<std::chrono::high_resolution_clock, std::chrono::nanoseconds>
+            tbegin, tend;
+        tbegin = std::chrono::high_resolution_clock::now();

Review Comment:
   Example?



##########
src/runtime/graph_executor/debug/graph_executor_debug.cc:
##########
@@ -362,6 +396,33 @@ PackedFunc GraphExecutorDebug::GetFunction(const std::string& name,
       ICHECK_GE(min_repeat_ms, 0);
       *rv = this->RunIndividual(number, repeat, min_repeat_ms);
     });
+  } else if (name == "run_individual_node") {
+    return PackedFunc([sptr_to_self, this](TVMArgs args, TVMRetValue* rv) {
+      int node_index = args[0];
+      int number = args[1];
+      int repeat = args[2];
+      int min_repeat_ms = args[3];
+      ICHECK_GE(node_index, 0);
+      ICHECK_LT(node_index, nodes_.size());
+      ICHECK_GT(number, 0);
+      ICHECK_GT(repeat, 0);
+      ICHECK_GE(min_repeat_ms, 0);
+      std::vector<std::vector<double>> results =
+          this->RunIndividualNode(node_index, number, repeat, min_repeat_ms);
+
+      std::stringstream s;
+      s.precision(6);  // down to microseconds
+
+      for (std::vector<double>& row : results) {
+        for (double cur : row) {
+          s << cur << ", ";
+        }
+        s << "\n";
+      }
+
+      // Have problems returning Integers and FloatImm so this is hack
+      *rv = s.str();

Review Comment:
   Haha well double -> char* have problems if the endianness of the two systems is different B). Yeah IDK what the right way is. I was mostly following the string serialization approach that `run_individual` does, though I understand that this is slow over RPC



##########
src/runtime/graph_executor/debug/graph_executor_debug.cc:
##########
@@ -362,6 +396,33 @@ PackedFunc GraphExecutorDebug::GetFunction(const std::string& name,
       ICHECK_GE(min_repeat_ms, 0);
       *rv = this->RunIndividual(number, repeat, min_repeat_ms);
     });
+  } else if (name == "run_individual_node") {
+    return PackedFunc([sptr_to_self, this](TVMArgs args, TVMRetValue* rv) {

Review Comment:
   Thanks, done



##########
src/runtime/graph_executor/debug/graph_executor_debug.cc:
##########
@@ -114,15 +94,69 @@ class GraphExecutorDebug : public GraphExecutor {
 
     std::ostringstream os;
     for (size_t index = 0; index < time_sec_per_op.size(); index++) {
-      os << time_sec_per_op[index] << ",";
+      double time = time_sec_per_op[index];
+      // To have good behavior when calculating total time, etc.
+      if (isnan(time)) {
+        time = 0;
+      }
+      os << time << ",";
     }
     return os.str();
   }
 
+  std::vector<std::vector<double>> RunIndividualNode(int node_index, int number, int repeat,
+                                                     int min_repeat_ms) {
+    // warmup run
+    // GraphExecutor::Run();
+    std::string tkey = module_->type_key();
+
+    // results_in_seconds[a][b] is the bth index run of the ath index repeat
+    std::vector<std::vector<double>> results_in_seconds;
+
+    if (tkey == "rpc") {
+      LOG(FATAL) << "RPC measurements should not use RunIndividualNode!";
+    }
+
+    for (int i = 0; i < repeat; ++i) {
+      std::vector<Timer> op_timers;
+      double duration_ms = 0.0;
+
+      // Keep timing operations, upping number of repeats until we reach min_repeat_ms
+      do {
+        op_timers.clear();
+        if (duration_ms > 0.0) {
+          number = static_cast<int>(std::max((min_repeat_ms / (duration_ms / number) + 1),
+                                             number * 1.618));  // 1.618 is chosen by random
+        }
+
+        std::chrono::time_point<std::chrono::high_resolution_clock, std::chrono::nanoseconds>
+            tbegin, tend;
+        tbegin = std::chrono::high_resolution_clock::now();

Review Comment:
   Hmm, in the refactor this needs to return some timing info (it can't be outside measuring in). Do you have example of the timers interface (or just the file to look, `timers` is kind of hard to grep for).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] AndrewZhaoLuo commented on pull request #11000: [Graph Debugger] Expose way to benchmark individual nodes.

Posted by GitBox <gi...@apache.org>.

AndrewZhaoLuo commented on PR #11000:
URL: https://github.com/apache/tvm/pull/11000#issuecomment-1106962602

   PTAL @tkonolige 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] AndrewZhaoLuo commented on pull request #11000: [Graph Debugger] Expose way to benchmark individual nodes.

Posted by GitBox <gi...@apache.org>.

AndrewZhaoLuo commented on PR #11000:
URL: https://github.com/apache/tvm/pull/11000#issuecomment-1098266346

   cc @tkonolige 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] AndrewZhaoLuo commented on a diff in pull request #11000: [Graph Debugger] Expose way to benchmark individual nodes.

Posted by GitBox <gi...@apache.org>.

AndrewZhaoLuo commented on code in PR #11000:
URL: https://github.com/apache/tvm/pull/11000#discussion_r856618731


##########
src/runtime/graph_executor/debug/graph_executor_debug.cc:
##########
@@ -114,15 +94,69 @@ class GraphExecutorDebug : public GraphExecutor {
 
     std::ostringstream os;
     for (size_t index = 0; index < time_sec_per_op.size(); index++) {
-      os << time_sec_per_op[index] << ",";
+      double time = time_sec_per_op[index];
+      // To have good behavior when calculating total time, etc.
+      if (isnan(time)) {
+        time = 0;
+      }
+      os << time << ",";
     }
     return os.str();
   }
 
+  std::vector<std::vector<double>> RunIndividualNode(int node_index, int number, int repeat,
+                                                     int min_repeat_ms) {
+    // warmup run
+    // GraphExecutor::Run();
+    std::string tkey = module_->type_key();
+
+    // results_in_seconds[a][b] is the bth index run of the ath index repeat
+    std::vector<std::vector<double>> results_in_seconds;
+
+    if (tkey == "rpc") {
+      LOG(FATAL) << "RPC measurements should not use RunIndividualNode!";
+    }
+
+    for (int i = 0; i < repeat; ++i) {
+      std::vector<Timer> op_timers;
+      double duration_ms = 0.0;
+
+      // Keep timing operations, upping number of repeats until we reach min_repeat_ms
+      do {
+        op_timers.clear();
+        if (duration_ms > 0.0) {
+          number = static_cast<int>(std::max((min_repeat_ms / (duration_ms / number) + 1),
+                                             number * 1.618));  // 1.618 is chosen by random
+        }
+
+        std::chrono::time_point<std::chrono::high_resolution_clock, std::chrono::nanoseconds>
+            tbegin, tend;
+        tbegin = std::chrono::high_resolution_clock::now();

Review Comment:
   Done, it now uses the linked time evaluator. Lots of deduped code B)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] AndrewZhaoLuo commented on pull request #11000: [Graph Debugger] Expose way to benchmark individual nodes.

Posted by GitBox <gi...@apache.org>.

AndrewZhaoLuo commented on PR #11000:
URL: https://github.com/apache/tvm/pull/11000#issuecomment-1109090175

   > The code look good, but could you add a test case?
   
   Done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tkonolige commented on a diff in pull request #11000: [Graph Debugger] Expose way to benchmark individual nodes.

Posted by GitBox <gi...@apache.org>.

tkonolige commented on code in PR #11000:
URL: https://github.com/apache/tvm/pull/11000#discussion_r852420219


##########
python/tvm/contrib/debugger/debug_executor.py:
##########
@@ -281,6 +282,42 @@ def run_individual(self, number, repeat=1, min_repeat_ms=0):
         ret = self._run_individual(number, repeat, min_repeat_ms)
         return ret.strip(",").split(",") if ret else []
 
+    def run_individual_node(self, index, number, repeat=1, min_repeat_ms=0):
+        """Benchmark a single node in the serialized graph.

Review Comment:
   Can you specify that this does not do any data transfer and uses arrays that are already on the device?



##########
python/tvm/contrib/debugger/debug_executor.py:
##########
@@ -281,6 +282,42 @@ def run_individual(self, number, repeat=1, min_repeat_ms=0):
         ret = self._run_individual(number, repeat, min_repeat_ms)
         return ret.strip(",").split(",") if ret else []
 
+    def run_individual_node(self, index, number, repeat=1, min_repeat_ms=0):
+        """Benchmark a single node in the serialized graph.
+
+        Parameters
+        ----------
+        index : int
+            The index of the node, see `self.debug_datum.get_graph_nodes`
+
+        number: int
+            The number of times to run the node to get a benchmark result.
+
+        repeat: int
+            The number of times to benchmark the nodes.

Review Comment:
   Can you use the same language as the `time_evaluator` docs here. Or just point to them.



##########
src/runtime/graph_executor/debug/graph_executor_debug.cc:
##########
@@ -362,6 +396,33 @@ PackedFunc GraphExecutorDebug::GetFunction(const std::string& name,
       ICHECK_GE(min_repeat_ms, 0);
       *rv = this->RunIndividual(number, repeat, min_repeat_ms);
     });
+  } else if (name == "run_individual_node") {
+    return PackedFunc([sptr_to_self, this](TVMArgs args, TVMRetValue* rv) {
+      int node_index = args[0];
+      int number = args[1];
+      int repeat = args[2];
+      int min_repeat_ms = args[3];
+      ICHECK_GE(node_index, 0);
+      ICHECK_LT(node_index, nodes_.size());
+      ICHECK_GT(number, 0);
+      ICHECK_GT(repeat, 0);
+      ICHECK_GE(min_repeat_ms, 0);
+      std::vector<std::vector<double>> results =
+          this->RunIndividualNode(node_index, number, repeat, min_repeat_ms);
+
+      std::stringstream s;
+      s.precision(6);  // down to microseconds

Review Comment:
   This should use the maximum precision available and `std::fixed` to avoid any issues with rounding (which we've encountered in tests before).



##########
src/runtime/graph_executor/debug/graph_executor_debug.cc:
##########
@@ -114,15 +94,69 @@ class GraphExecutorDebug : public GraphExecutor {
 
     std::ostringstream os;
     for (size_t index = 0; index < time_sec_per_op.size(); index++) {
-      os << time_sec_per_op[index] << ",";
+      double time = time_sec_per_op[index];
+      // To have good behavior when calculating total time, etc.
+      if (isnan(time)) {
+        time = 0;
+      }

Review Comment:
   I don't really understand how `time_sec_per_op` could become nan, but I this is probably a good change regardless.



##########
python/tvm/contrib/debugger/debug_executor.py:
##########
@@ -281,6 +282,42 @@ def run_individual(self, number, repeat=1, min_repeat_ms=0):
         ret = self._run_individual(number, repeat, min_repeat_ms)
         return ret.strip(",").split(",") if ret else []
 
+    def run_individual_node(self, index, number, repeat=1, min_repeat_ms=0):

Review Comment:
   Number should default to the same as `time_evaluator` (10 I think).



##########
src/runtime/graph_executor/debug/graph_executor_debug.cc:
##########
@@ -114,15 +94,69 @@ class GraphExecutorDebug : public GraphExecutor {
 
     std::ostringstream os;
     for (size_t index = 0; index < time_sec_per_op.size(); index++) {
-      os << time_sec_per_op[index] << ",";
+      double time = time_sec_per_op[index];
+      // To have good behavior when calculating total time, etc.
+      if (isnan(time)) {
+        time = 0;
+      }
+      os << time << ",";
     }
     return os.str();
   }
 
+  std::vector<std::vector<double>> RunIndividualNode(int node_index, int number, int repeat,
+                                                     int min_repeat_ms) {
+    // warmup run
+    // GraphExecutor::Run();
+    std::string tkey = module_->type_key();
+
+    // results_in_seconds[a][b] is the bth index run of the ath index repeat
+    std::vector<std::vector<double>> results_in_seconds;
+
+    if (tkey == "rpc") {
+      LOG(FATAL) << "RPC measurements should not use RunIndividualNode!";
+    }
+
+    for (int i = 0; i < repeat; ++i) {
+      std::vector<Timer> op_timers;
+      double duration_ms = 0.0;
+
+      // Keep timing operations, upping number of repeats until we reach min_repeat_ms
+      do {
+        op_timers.clear();
+        if (duration_ms > 0.0) {
+          number = static_cast<int>(std::max((min_repeat_ms / (duration_ms / number) + 1),
+                                             number * 1.618));  // 1.618 is chosen by random
+        }
+
+        std::chrono::time_point<std::chrono::high_resolution_clock, std::chrono::nanoseconds>
+            tbegin, tend;
+        tbegin = std::chrono::high_resolution_clock::now();

Review Comment:
   I know you just moved the code, but this should really use the timers interface. Or better yet, just call out to time_evaluator.



##########
src/runtime/graph_executor/debug/graph_executor_debug.cc:
##########
@@ -362,6 +396,33 @@ PackedFunc GraphExecutorDebug::GetFunction(const std::string& name,
       ICHECK_GE(min_repeat_ms, 0);
       *rv = this->RunIndividual(number, repeat, min_repeat_ms);
     });
+  } else if (name == "run_individual_node") {
+    return PackedFunc([sptr_to_self, this](TVMArgs args, TVMRetValue* rv) {
+      int node_index = args[0];
+      int number = args[1];
+      int repeat = args[2];
+      int min_repeat_ms = args[3];
+      ICHECK_GE(node_index, 0);
+      ICHECK_LT(node_index, nodes_.size());
+      ICHECK_GT(number, 0);
+      ICHECK_GT(repeat, 0);
+      ICHECK_GE(min_repeat_ms, 0);
+      std::vector<std::vector<double>> results =
+          this->RunIndividualNode(node_index, number, repeat, min_repeat_ms);
+
+      std::stringstream s;
+      s.precision(6);  // down to microseconds
+
+      for (std::vector<double>& row : results) {
+        for (double cur : row) {
+          s << cur << ", ";
+        }
+        s << "\n";
+      }
+
+      // Have problems returning Integers and FloatImm so this is hack
+      *rv = s.str();

Review Comment:
   We really could use support for sending arrays of floats and ints over RPC. Note that `time_evaluator` just casts an array of `double` to `char*` and send that. Not sure it is a better approach though.



##########
src/runtime/graph_executor/debug/graph_executor_debug.cc:
##########
@@ -362,6 +396,33 @@ PackedFunc GraphExecutorDebug::GetFunction(const std::string& name,
       ICHECK_GE(min_repeat_ms, 0);
       *rv = this->RunIndividual(number, repeat, min_repeat_ms);
     });
+  } else if (name == "run_individual_node") {
+    return PackedFunc([sptr_to_self, this](TVMArgs args, TVMRetValue* rv) {

Review Comment:
   FYI, you can use `TypedPackedFunc` with a lambda to avoid manually unpacking the args.



##########
python/tvm/contrib/debugger/debug_executor.py:
##########
@@ -281,6 +282,42 @@ def run_individual(self, number, repeat=1, min_repeat_ms=0):
         ret = self._run_individual(number, repeat, min_repeat_ms)
         return ret.strip(",").split(",") if ret else []
 
+    def run_individual_node(self, index, number, repeat=1, min_repeat_ms=0):
+        """Benchmark a single node in the serialized graph.
+
+        Parameters
+        ----------
+        index : int
+            The index of the node, see `self.debug_datum.get_graph_nodes`
+
+        number: int
+            The number of times to run the node to get a benchmark result.
+
+        repeat: int
+            The number of times to benchmark the nodes.
+
+        min_repeat_ms: int
+            The minimum consecutive runtime of the node for a benchmark result.
+
+        Returns
+        -------
+        A list of dimensions `number` x `repeat` each one the runtime of the node

Review Comment:
   I don't really understand this message. Also, would it make sense to return an array of `BenchmarkResult` to match `time_evaluator`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tkonolige commented on a diff in pull request #11000: [Graph Debugger] Expose way to benchmark individual nodes.

Posted by GitBox <gi...@apache.org>.

tkonolige commented on code in PR #11000:
URL: https://github.com/apache/tvm/pull/11000#discussion_r858107639


##########
tests/python/unittest/test_runtime_graph_debug.py:
##########
@@ -185,5 +191,47 @@ def check_remote(server):
     check_remote(rpc.Server("127.0.0.1"))
 
 
+@tvm.testing.requires_llvm
+def test_run_single_node(graph):
+    mlib_proxy = tvm.support.FrontendTestModule()
+    mlib_proxy["myadd"] = myadd
+    try:
+        mod: debug_executor.GraphModuleDebug = debug_executor.create(graph, mlib_proxy, tvm.cpu(0))
+    except ValueError:
+        return

Review Comment:
   Its better to use `@pytest.skipif(tvm.support.libinfo()["USE_PROFILER"] != "ON", "TVM was not built with profiler support")`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi merged pull request #11000: [Graph Debugger] Expose way to benchmark individual nodes.

Posted by GitBox <gi...@apache.org>.

masahi merged PR #11000:
URL: https://github.com/apache/tvm/pull/11000


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org