You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/07/14 20:51:59 UTC

[GitHub] [incubator-tvm] merrymercy commented on a change in pull request #5914: [clflush] Enable x86 cpu cache flush

merrymercy commented on a change in pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#discussion_r454630575



##########
File path: python/tvm/autotvm/measure/measure_methods.py
##########
@@ -180,12 +180,14 @@ class RPCRunner(Runner):
         Whether check correctness after measurement. This will use llvm cpu target to
         call your template and get the reference output.
         This can work for TOPI templates, but may not work for your custom template.
+    enable_cpu_cache_flush: bool
+        Whether to enable cpu cache flush, which only has effect on CPU task.

Review comment:
       ```suggestion
           Whether to flush cache on CPU between repeated measurements.
           Flushing cache can make the measured latency of one operator closer to 
           its actual latency during end-to-end inference.
           To make this option effective, the argument `number` should also be set to 1.
   ```

##########
File path: tutorials/autotvm/tune_relay_x86.py
##########
@@ -114,6 +114,17 @@ def get_network(name, batch_size):
 # We will use local mode for tuning configuration. RPC tracker
 # mode can be setup similarly to the approach in
 # :ref:`tune_relay_arm` tutorial.
+#
+# In the measure option, we turn on enable_cpu_cache_flush to
+# get more precise measurement. When we turn it on, we don't
+# need to set min_repeat_ms to dynamically adjust to run op
+# many times so that we get precise measurement as when we
+# have cache flush, we could get precise measurement even we
+# run serveral times. So, we could just set number be 1 and
+# repeat be 10 to run only 10 times. The reason we set number be 1
+# is we will turn on cache flush before every repeat run in
+# internal implementation. So if number is greater than 1, the
+# cache flush effect will be probably invalid.

Review comment:
       This paragraph explains the motivation of this PR. But it is very hard for new users to understand.
   Because this tutorial is for new users with few tvm experiences, I suggest the follwoing modification.
   

##########
File path: python/tvm/autotvm/measure/measure_methods.py
##########
@@ -309,7 +313,8 @@ class LocalRunner(RPCRunner):
         Whether check correctness after measurement. This will use llvm cpu target to
         call your template and get the reference output.
         This can work for TOPI templates, but may not work for your custom template.
-
+    enable_cpu_cache_flush: bool
+        Whether to enable cpu cache flush, which only has effect on CPU task.

Review comment:
       ```suggestion
           Whether to flush cache on CPU between repeated measurements.
           Flushing cache can make the measured latency of one operator closer to 
           its actual latency during end-to-end inference.
           To make this option effective, the argument `number` should also be set to 1.
   ```

##########
File path: tutorials/autotvm/tune_relay_x86.py
##########
@@ -114,6 +114,17 @@ def get_network(name, batch_size):
 # We will use local mode for tuning configuration. RPC tracker
 # mode can be setup similarly to the approach in
 # :ref:`tune_relay_arm` tutorial.
+#
+# In the measure option, we turn on enable_cpu_cache_flush to
+# get more precise measurement. When we turn it on, we don't
+# need to set min_repeat_ms to dynamically adjust to run op
+# many times so that we get precise measurement as when we
+# have cache flush, we could get precise measurement even we
+# run serveral times. So, we could just set number be 1 and
+# repeat be 10 to run only 10 times. The reason we set number be 1
+# is we will turn on cache flush before every repeat run in
+# internal implementation. So if number is greater than 1, the
+# cache flush effect will be probably invalid.

Review comment:
       ```suggestion
   # To perform a precise measurement, we should repeat the measurement several times and
   # use the average of results.  In addition, we need to flush the cache for the weight tensors
   # between repeated measurements. This can make the measured latency of one operator 
   # closer to its actual latency during end-to-end inference.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org