You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/06/24 11:45:37 UTC

[GitHub] [incubator-tvm] FrozenGene opened a new pull request #5914: [clflush] Enable x86 cpu cache flush

FrozenGene opened a new pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914


   When we are in the tuning of TVM, we will make TVM occupy the cache fully and doesn't flush it during iteration. This has problems then in e2e testing, since arrays that we assume exist in cache (ie. weights) are evicted during e2e runs,
   which leads to lower performance. This has been demonstrated in Ansor. 
   
   @merrymercy @tqchen @jcf94 @minminsun 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tqchen commented on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

tqchen commented on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-659045843


   Thanks @FrozenGene @merrymercy @comaniac 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] FrozenGene commented on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

FrozenGene commented on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-651708824


   > How about we go with A1 for now, @FrozenGene can you update this PR to A1?
   
   @tqchen If we use
    ```
   # cache flush packed is a packed func that performs the cpu cache flush
   cache_flush_packed = remote.get_function("cpu_cache_flush")(args=begin=1)
   # fprepare is a callback that will be called before the evaluation, it takes in args as arguments. 
   f = mod.time_evaluator("myfunc", repeat=10, fprepare=cache_flush_packed)
   ```
   We will meet `Cannot pass type FunctionHandle as an argument to the remote` error. Do you have any good suggestion about it?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] FrozenGene edited a comment on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

FrozenGene edited a comment on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-651708824


   > How about we go with A1 for now, @FrozenGene can you update this PR to A1?
   
   @tqchen If we use
    ```python
   # cache flush packed is a packed func that performs the cpu cache flush
   cache_flush_packed = remote.get_function("cpu_cache_flush")(args=begin=1)
   # fprepare is a callback that will be called before the evaluation, it takes in args as arguments. 
   f = mod.time_evaluator("myfunc", repeat=10, fprepare=cache_flush_packed)
   ```
   We will meet `Cannot pass type FunctionHandle as an argument to the remote` error. Do you have any good suggestion about it?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] FrozenGene commented on a change in pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

FrozenGene commented on a change in pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#discussion_r454023177



##########
File path: python/tvm/autotvm/measure/measure_methods.py
##########
@@ -473,8 +482,10 @@ def run_through_rpc(measure_input, build_result,
         remote.upload(build_result.filename)
         func = remote.load_module(os.path.split(build_result.filename)[1])
         ctx = remote.context(str(measure_input.target), 0)
+        f_prepare = 'cache_flush_cpu_non_first_arg' if enable_cpu_cache_flush else ''

Review comment:
       Previous design I also want to pass the PackedFunc here directly but it will have problem in RPC mentioned here: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-654560411 So currently we could only pass the string to work around rpc restriction.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] FrozenGene commented on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

FrozenGene commented on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-651477619


   > How about we go with A1 for now, @FrozenGene can you update this PR to A1?
   
   OK. I will implement this PR to A1 next.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tqchen merged pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

tqchen merged pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] FrozenGene edited a comment on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

FrozenGene edited a comment on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-650891863


   I want to share my testing on my skylake machine result. The CPU is Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz. I use one core to tune. The tuning method is the tutorial of `tune_relay_x86.py` (for without clflush) and this pr's modification of `tune_relay_x86.py` (for with clflush)
   
   Without cache flush:
   
   ```
   [Task  1/12]  Current/Best:   43.31/  44.04 GFLOPS | Progress: (800/800) | 1325.94 s Done.
   [Task  2/12]  Current/Best:    4.33/  44.02 GFLOPS | Progress: (720/720) | 1152.81 s Done.
   [Task  3/12]  Current/Best:   44.19/  65.08 GFLOPS | Progress: (972/972) | 1492.48 s Done.
   [Task  4/12]  Current/Best:   39.14/  57.74 GFLOPS | Progress: (864/864) | 1292.03 s Done.
   [Task  5/12]  Current/Best:   53.04/  66.45 GFLOPS | Progress: (1024/1024) | 1584.45 s Done.
   [Task  6/12]  Current/Best:   42.93/  54.29 GFLOPS | Progress: (896/896) | 1382.46 s Done.
   [Task  7/12]  Current/Best:    8.67/  63.83 GFLOPS | Progress: (980/980) | 1501.93 s Done.
   [Task  8/12]  Current/Best:   32.04/  57.19 GFLOPS | Progress: (308/308) | 484.76 s Done.
   [Task  9/12]  Current/Best:   12.25/  46.87 GFLOPS | Progress: (980/980) | 2161.56 s Done.
   [Task 10/12]  Current/Best:   41.17/  49.36 GFLOPS | Progress: (896/896) | 2174.74 s Done.
   [Task 11/12]  Current/Best:   17.24/  49.36 GFLOPS | Progress: (864/864) | 2075.64 s Done.
   [Task 12/12]  Current/Best:   23.43/  51.69 GFLOPS | Progress: (720/720) | 1708.31 s Done.
   ```
   
   With cache flush:
   ```
   [Task  1/12]  Current/Best:   41.26/  42.29 GFLOPS | Progress: (800/800) | 543.79 s Done.
   [Task  2/12]  Current/Best:    4.30/  41.93 GFLOPS | Progress: (720/720) | 338.14 s Done.
   [Task  3/12]  Current/Best:   43.09/  64.36 GFLOPS | Progress: (972/972) | 503.03 s Done.
   [Task  4/12]  Current/Best:   41.95/  56.23 GFLOPS | Progress: (864/864) | 350.40 s Done.
   [Task  5/12]  Current/Best:   52.39/  66.52 GFLOPS | Progress: (1024/1024) | 505.65 s Done.
   [Task  6/12]  Current/Best:   42.34/  53.17 GFLOPS | Progress: (896/896) | 353.18 s Done.
   [Task  7/12]  Current/Best:    8.38/  62.88 GFLOPS | Progress: (980/980) | 492.13 s Done.
   [Task  8/12]  Current/Best:   31.29/  57.12 GFLOPS | Progress: (308/308) | 166.95 s Done.
   [Task  9/12]  Current/Best:   12.36/  40.97 GFLOPS | Progress: (980/980) | 302.91 s Done.
   [Task 10/12]  Current/Best:   36.14/  41.85 GFLOPS | Progress: (896/896) | 264.56 s Done.
   [Task 11/12]  Current/Best:   16.24/  48.53 GFLOPS | Progress: (864/864) | 257.15 s Done.
   [Task 12/12]  Current/Best:   19.53/  47.11 GFLOPS | Progress: (720/720) | 212.18 s Done.
   ```
   
   The execution time:
   88.36ms (w/ clflush) v.s. 87.26ms (w/o clflush). 
   
   As you could see, if we have clflush, almost single layer's tuning gflops is slower than without clflush. But when we run it end2end, we could get better result. And with clflush, we could have much less tuning time as we only need to tune 10 times (even could less).
   
   As said before, if we have winograd for cpu, this becomes more important.
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] FrozenGene commented on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

FrozenGene commented on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-657976125


   @comaniac I have updated the doc and change the name from `f_prepare` to `f_preproc`. Could you help to review it again?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] comaniac commented on a change in pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

comaniac commented on a change in pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#discussion_r454034185



##########
File path: python/tvm/autotvm/measure/measure_methods.py
##########
@@ -473,8 +482,10 @@ def run_through_rpc(measure_input, build_result,
         remote.upload(build_result.filename)
         func = remote.load_module(os.path.split(build_result.filename)[1])
         ctx = remote.context(str(measure_input.target), 0)
+        f_prepare = 'cache_flush_cpu_non_first_arg' if enable_cpu_cache_flush else ''

Review comment:
       Ah sorry for missing the previous comment. Then we could leave it for now. Meanwhile, do you guys think it would be better to add a comment saying about this restriction?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] comaniac commented on a change in pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

comaniac commented on a change in pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#discussion_r453779891



##########
File path: python/tvm/autotvm/measure/measure_methods.py
##########
@@ -309,7 +313,8 @@ class LocalRunner(RPCRunner):
         Whether check correctness after measurement. This will use llvm cpu target to
         call your template and get the reference output.
         This can work for TOPI templates, but may not work for your custom template.
-
+    enable_cpu_cache_flush: bool
+        Whether to enable cpu cache flush

Review comment:
       Maybe explicitly mention that this has no effect on GPU tasks?

##########
File path: python/tvm/runtime/module.py
##########
@@ -163,7 +163,7 @@ def save(self, file_name, fmt=""):
         """
         _ffi_api.ModuleSaveToFile(self, file_name, fmt)
 
-    def time_evaluator(self, func_name, ctx, number=10, repeat=1, min_repeat_ms=0):
+    def time_evaluator(self, func_name, ctx, number=10, repeat=1, min_repeat_ms=0, f_prepare=''):

Review comment:
       Would `f_preproc` be more clear?

##########
File path: python/tvm/autotvm/measure/measure_methods.py
##########
@@ -454,6 +461,8 @@ def run_through_rpc(measure_input, build_result,
         The reference input used for checking correctness
     ref_output: List of np.ndarray
         The reference output used for checking correctness
+    enable_cpu_cache_flush: bool
+        Whether to enable cpu cache flush

Review comment:
       ditto

##########
File path: python/tvm/autotvm/measure/measure_methods.py
##########
@@ -473,8 +482,10 @@ def run_through_rpc(measure_input, build_result,
         remote.upload(build_result.filename)
         func = remote.load_module(os.path.split(build_result.filename)[1])
         ctx = remote.context(str(measure_input.target), 0)
+        f_prepare = 'cache_flush_cpu_non_first_arg' if enable_cpu_cache_flush else ''

Review comment:
       The processing flow of this function seems a bit long:
   1. setting `enable_cpu_cache_flush` to True
   2. setting `f_prepare` to "cache_flush_cpu_non_first_arg"
   3. setting `pf_prepare` to the packed function
   
   To me, it's hard for people to realize "cache_flush_cpu_non_first_arg" is the global function symbol unless tracing back to `rpc_module.cc`. Is that possible to get the packed function here directly so that we can set the type of `f_prepare` to `PackedFunc`? It's also more general for its description (the prepared function) IMHO.
   

##########
File path: tutorials/autotvm/tune_relay_x86.py
##########
@@ -122,8 +122,9 @@ def get_network(name, batch_size):
 
     'measure_option': autotvm.measure_option(
         builder=autotvm.LocalBuilder(),
-        runner=autotvm.LocalRunner(number=10, repeat=1,
-                                   min_repeat_ms=1000),
+        runner=autotvm.LocalRunner(number=1, repeat=10,
+                                   min_repeat_ms=0,
+                                   enable_cpu_cache_flush=True),

Review comment:
       Might be better to discuss `enable_cpu_cache_flush` in this tutorial to explain 1) this is only for CPU, and 2) how we set number/repeat/min_repeat_ms when this option is enabled.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] FrozenGene commented on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

FrozenGene commented on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-650891863


   I want to share my testing on my skylake machine result. The CPU is Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz. I use one core to tune. The tuning method is the tutorial of `tune_relay_x86.py` (for without clflush) and this pr's modification of tune_relay_x86.py` (for with clflush)
   
   Without cache flush:
   
   ```
   [Task  1/12]  Current/Best:   43.31/  44.04 GFLOPS | Progress: (800/800) | 1325.94 s Done.
   [Task  2/12]  Current/Best:    4.33/  44.02 GFLOPS | Progress: (720/720) | 1152.81 s Done.
   [Task  3/12]  Current/Best:   44.19/  65.08 GFLOPS | Progress: (972/972) | 1492.48 s Done.
   [Task  4/12]  Current/Best:   39.14/  57.74 GFLOPS | Progress: (864/864) | 1292.03 s Done.
   [Task  5/12]  Current/Best:   53.04/  66.45 GFLOPS | Progress: (1024/1024) | 1584.45 s Done.
   [Task  6/12]  Current/Best:   42.93/  54.29 GFLOPS | Progress: (896/896) | 1382.46 s Done.
   [Task  7/12]  Current/Best:    8.67/  63.83 GFLOPS | Progress: (980/980) | 1501.93 s Done.
   [Task  8/12]  Current/Best:   32.04/  57.19 GFLOPS | Progress: (308/308) | 484.76 s Done.
   [Task  9/12]  Current/Best:   12.25/  46.87 GFLOPS | Progress: (980/980) | 2161.56 s Done.
   [Task 10/12]  Current/Best:   41.17/  49.36 GFLOPS | Progress: (896/896) | 2174.74 s Done.
   [Task 11/12]  Current/Best:   17.24/  49.36 GFLOPS | Progress: (864/864) | 2075.64 s Done.
   [Task 12/12]  Current/Best:   23.43/  51.69 GFLOPS | Progress: (720/720) | 1708.31 s Done.
   ```
   
   With cache flush:
   ```
   [Task  1/12]  Current/Best:   41.26/  42.29 GFLOPS | Progress: (800/800) | 543.79 s Done.
   [Task  2/12]  Current/Best:    4.30/  41.93 GFLOPS | Progress: (720/720) | 338.14 s Done.
   [Task  3/12]  Current/Best:   43.09/  64.36 GFLOPS | Progress: (972/972) | 503.03 s Done.
   [Task  4/12]  Current/Best:   41.95/  56.23 GFLOPS | Progress: (864/864) | 350.40 s Done.
   [Task  5/12]  Current/Best:   52.39/  66.52 GFLOPS | Progress: (1024/1024) | 505.65 s Done.
   [Task  6/12]  Current/Best:   42.34/  53.17 GFLOPS | Progress: (896/896) | 353.18 s Done.
   [Task  7/12]  Current/Best:    8.38/  62.88 GFLOPS | Progress: (980/980) | 492.13 s Done.
   [Task  8/12]  Current/Best:   31.29/  57.12 GFLOPS | Progress: (308/308) | 166.95 s Done.
   [Task  9/12]  Current/Best:   12.36/  40.97 GFLOPS | Progress: (980/980) | 302.91 s Done.
   [Task 10/12]  Current/Best:   36.14/  41.85 GFLOPS | Progress: (896/896) | 264.56 s Done.
   [Task 11/12]  Current/Best:   16.24/  48.53 GFLOPS | Progress: (864/864) | 257.15 s Done.
   [Task 12/12]  Current/Best:   19.53/  47.11 GFLOPS | Progress: (720/720) | 212.18 s Done.
   ```
   
   The execution time:
   88.36ms (w/ clflush) v.s. 87.26ms (w/o clflush). 
   
   As you could see, if we have clflush, almost single layer's tuning gflops is slower than without clflush. But when we run it end2end, we could get better result. And with clflush, we could have much less tuning time as we only need to tune 10 times (even could less).
   
   As said before, if we have winograd for cpu, this becomes more important.
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tqchen commented on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

tqchen commented on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-648903111


   I like the overall util for cache flushing, however, it would be great to discuss the interface of cache eviction. In terms the API choices:
   
   - While it is understandable that we could like to include first argument(activation) and flush the rest of the argument, it is still a very specific setup(ideally it might be better to make it configurable).
   - Right now things are configured through env variable, is it the best way to configure API?
   - The current logic does not check for other context(besides CPU), and will results un determined behavior when we use OpenCL or CUDA(because the opaque data ptr does not corresponds to a CPU addresss), it might also cause problem when the function is not an DLTensor
   
   Here are a few alternative API choices for configuring the cache flushing behavior.
   
   ### A0: Fold cache flushing factor into time_evaluator
   
   ```python
   mod = load_module()
   # flush cpu cache of args 1
   f = mod.time_evaluator("myfunc", repeat=10, cache_flush_cpu_args_begin=1)
   ```
   
   
   ### A1: Decoupled Composite style
   ```python
   mod = load_module()
   # cache flush packed is a packed func that performs the cpu cache flush
   cache_flush_packed = remote.get_function("cpu_cache_flush")(args=begin=1)
   # fprepare is a callback that will be called before the evaluation, it takes in args as arguments. 
   f = mod.time_evaluator("myfunc", repeat=10, fprepare=cache_flush_packed)
   ```
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] FrozenGene commented on a change in pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

FrozenGene commented on a change in pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#discussion_r454024398



##########
File path: python/tvm/runtime/module.py
##########
@@ -163,7 +163,7 @@ def save(self, file_name, fmt=""):
         """
         _ffi_api.ModuleSaveToFile(self, file_name, fmt)
 
-    def time_evaluator(self, func_name, ctx, number=10, repeat=1, min_repeat_ms=0):
+    def time_evaluator(self, func_name, ctx, number=10, repeat=1, min_repeat_ms=0, f_prepare=''):

Review comment:
       I agree with you.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] junrushao1994 commented on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

junrushao1994 commented on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-648948620


   I am not in favor of setting environment variables. This is like an even worse global variable and is largely unsafe. If this is really the case, we probably need a global atomic flag, or a thread local flag. Do not use environment variables if we can avoid. Thank you!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] FrozenGene commented on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

FrozenGene commented on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-650738036


   > I agree that the cache flush mechanism is useful in getting preciser measurement. It would be great if @FrozenGene can provide some experimental data to further assure.
   > 
   > I vote for folding cache flushing factor into time_evaluator for succinctness. And making it more configurable and generic sounds good to me.
   
   @yidawang  Previous experimental data is almost based on Ansor, especially x86 winograd. Like winograd of 1x7x7x512x512, the single op tuning performance time could reach in 0.11ms (on one skylake 512 machine), but when to execute on e2e, this op even could cost several ms (sorry I lost this number, only record 0.11ms). The issue is the const matrix and weight (for example, 3x3x513x512 will become 6x6x512x512 if tile size is 4).
   
   Another benefit to add clflush is we needn't min_repeat_ms (like 1000) because we could measure it very precisely. Like this pr, we even only set repeat to be 10. So we could reduce our tuning time.
   
   I am collecting auto tvm resnet18 data on one skylake machine and will share it when it completes ASAP.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] yidawang commented on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

yidawang commented on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-654578679


   @tqchen Sorry to drop the ball as I was out the entire last week. The current implementation looks good to me.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tqchen commented on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

tqchen commented on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-648903588


   cc @jwfromm @junrushao1994 @merrymercy @yidawang @jcf94 for quick discussion


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] yidawang commented on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

yidawang commented on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-649001621


   I agree that the cache flush mechanism is useful in getting preciser measurement. It would be great if @FrozenGene can provide some experimental data to further assure.
   
   I vote for folding cache flushing factor into time_evaluator for succinctness. And making it more configurable and generic sounds good to me.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] FrozenGene commented on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

FrozenGene commented on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-654758540


   > Ah, you are right, right now it is hard to pass function as an argument to the remote because it is wrapped under the std::function, we could lift the restriction later once we fold the PackedFunc as an object.
   > 
   > Right now we might need to pass fprepare by string as its name which skips the first argument by default.
   > 
   > ```c++
   > f = mod.time_evaluator("myfunc", repeat=10, fprepare="cache_flush_cpu_non_first_arg")
   > ```
   > 
   > It is still a bit more flexible than A0, as it enables multiple cache flush options but less flexible than A1. The migration to A1 possible though later.
   
   Another point I want to do one round of quick discussion. When we tune single op / single layer network performance, we don't want to do cache flush so that we could get maximum performance. Previous pr use environment value could control it easily (setting it in tune_network but doesn't set it when to tune single layer network). If we fold it into time_evaluator, how to distinguish these two conditions? One method I could come up is we could add one extra tuning option (like `enable_cache_flush`, default value could be False) for measure. Do you have better suggestions?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tqchen commented on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

tqchen commented on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-654560411






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] comaniac commented on a change in pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

comaniac commented on a change in pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#discussion_r454043821



##########
File path: python/tvm/autotvm/measure/measure_methods.py
##########
@@ -473,8 +482,10 @@ def run_through_rpc(measure_input, build_result,
         remote.upload(build_result.filename)
         func = remote.load_module(os.path.split(build_result.filename)[1])
         ctx = remote.context(str(measure_input.target), 0)
+        f_prepare = 'cache_flush_cpu_non_first_arg' if enable_cpu_cache_flush else ''

Review comment:
       I'm fine with it. Thanks




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] comaniac commented on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

comaniac commented on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-657985296


   > @comaniac I have updated the doc and change the name from `f_prepare` to `f_preproc`. Could you help to review it again?
   
   Didn't see the update on my side. Will check it later again.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tqchen commented on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

tqchen commented on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-650795226


   I agree that A1 is more modular, @yidawang do you have strong preference on A0?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] FrozenGene edited a comment on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

FrozenGene edited a comment on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-651708824


   > How about we go with A1 for now, @FrozenGene can you update this PR to A1?
   
   @tqchen If we use
    ```python
   # cache flush packed is a packed func that performs the cpu cache flush
   cache_flush_packed = remote.get_function("cpu_cache_flush")(args=begin=1)
   # fprepare is a callback that will be called before the evaluation, it takes in args as arguments. 
   f = mod.time_evaluator("myfunc", repeat=10, fprepare=cache_flush_packed)
   ```
   We will meet `Cannot pass type FunctionHandle as an argument to the remote` error. Do you have any good suggestion about it?
   
   Related Code (see `Wrong` and `Pass` part):
   ```c++
   TVM_REGISTER_GLOBAL("runtime.RPCTimeEvaluator")
       .set_body_typed([](Optional<Module> opt_mod, std::string name, int device_type, int device_id,
                          int number, int repeat, int min_repeat_ms, PackedFunc f_prepare) {
         TVMContext ctx;
         ctx.device_type = static_cast<DLDeviceType>(device_type);
         ctx.device_id = device_id;
         if (opt_mod.defined()) {
           Module m = opt_mod.value();
           std::string tkey = m->type_key();
           if (tkey == "rpc") {
             // Wrong
             return static_cast<RPCModuleNode*>(m.operator->())
                 ->GetTimeEvaluator(name, ctx, number, repeat, min_repeat_ms, f_prepare);
             // Pass
             ctx.device_type = static_cast<DLDeviceType>(ctx.device_type % kRPCSessMask);
             return WrapTimeEvaluator(m.GetFunction(name, false), ctx, number, repeat, min_repeat_ms, f_prepare);
           } else {
             std::cout << "First LOCAL " << tkey << " \n";
             return WrapTimeEvaluator(m.GetFunction(name, false), ctx, number, repeat, min_repeat_ms, f_prepare);
           }
         }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] FrozenGene edited a comment on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

FrozenGene edited a comment on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-650738036


   > I agree that the cache flush mechanism is useful in getting preciser measurement. It would be great if @FrozenGene can provide some experimental data to further assure.
   > 
   > I vote for folding cache flushing factor into time_evaluator for succinctness. And making it more configurable and generic sounds good to me.
   
   @yidawang  Previous experimental data is almost based on Ansor, especially x86 winograd. Like winograd of 1x7x7x512x512, the single op tuning performance time could reach in 0.11ms (on one skylake 512 machine), but when to execute on e2e, this op even could cost several ms (sorry I lost this number, only record 0.11ms). The issue is the const matrix and weight (for example, weight 3x3x513x512 will become 6x6x512x512 if tile size is 4).
   
   Another benefit to add clflush is we needn't min_repeat_ms (like 1000) because we could measure it very precisely. Like this pr, we even only set repeat to be 10. So we could reduce our tuning time.
   
   I am collecting auto tvm resnet18 data on one skylake machine and will share it when it completes ASAP.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] FrozenGene commented on a change in pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

FrozenGene commented on a change in pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#discussion_r454024307



##########
File path: tutorials/autotvm/tune_relay_x86.py
##########
@@ -122,8 +122,9 @@ def get_network(name, batch_size):
 
     'measure_option': autotvm.measure_option(
         builder=autotvm.LocalBuilder(),
-        runner=autotvm.LocalRunner(number=10, repeat=1,
-                                   min_repeat_ms=1000),
+        runner=autotvm.LocalRunner(number=1, repeat=10,
+                                   min_repeat_ms=0,
+                                   enable_cpu_cache_flush=True),

Review comment:
       I will update it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] FrozenGene commented on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

FrozenGene commented on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-653754843


   > > How about we go with A1 for now, @FrozenGene can you update this PR to A1?
   > 
   > @tqchen If we use
   > 
   > ```python
   > # cache flush packed is a packed func that performs the cpu cache flush
   > cache_flush_packed = remote.get_function("cpu_cache_flush")(args=begin=1)
   > # fprepare is a callback that will be called before the evaluation, it takes in args as arguments. 
   > f = mod.time_evaluator("myfunc", repeat=10, fprepare=cache_flush_packed)
   > ```
   > 
   > We will meet `Cannot pass type FunctionHandle as an argument to the remote` error. Do you have any good suggestion about it?
   > 
   > Related Code (see `Wrong` and `Pass` part):
   > 
   > ```c++
   > TVM_REGISTER_GLOBAL("runtime.RPCTimeEvaluator")
   >     .set_body_typed([](Optional<Module> opt_mod, std::string name, int device_type, int device_id,
   >                        int number, int repeat, int min_repeat_ms, PackedFunc f_prepare) {
   >       TVMContext ctx;
   >       ctx.device_type = static_cast<DLDeviceType>(device_type);
   >       ctx.device_id = device_id;
   >       if (opt_mod.defined()) {
   >         Module m = opt_mod.value();
   >         std::string tkey = m->type_key();
   >         if (tkey == "rpc") {
   >           // Wrong
   >           return static_cast<RPCModuleNode*>(m.operator->())
   >               ->GetTimeEvaluator(name, ctx, number, repeat, min_repeat_ms, f_prepare);
   >           // Pass
   >           ctx.device_type = static_cast<DLDeviceType>(ctx.device_type % kRPCSessMask);
   >           return WrapTimeEvaluator(m.GetFunction(name, false), ctx, number, repeat, min_repeat_ms, f_prepare);
   >         } else {
   >           return WrapTimeEvaluator(m.GetFunction(name, false), ctx, number, repeat, min_repeat_ms, f_prepare);
   >         }
   >       }
   > ```
   
   Do you have any idea about it? @tqchen 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] FrozenGene commented on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

FrozenGene commented on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-658515980


   @merrymercy I have updated the doc. For `enable_cpu_cache_flush`, I combine the advices and add one comment that which only has effect on CPU task. Could you spend time in reviewing it again? Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tqchen edited a comment on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

tqchen edited a comment on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-659045843


   Thanks @FrozenGene @merrymercy @comaniac @yidawang 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] FrozenGene commented on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

FrozenGene commented on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-657994138


   > > @comaniac I have updated the doc and change the name from `f_prepare` to `f_preproc`. Could you help to review it again?
   > 
   > Didn't see the update on my side. Will check it later again.
   
   I could see it now from my side. It should work from your side too?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] FrozenGene edited a comment on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

FrozenGene edited a comment on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-651708824


   > How about we go with A1 for now, @FrozenGene can you update this PR to A1?
   
   @tqchen If we use
    ```python
   # cache flush packed is a packed func that performs the cpu cache flush
   cache_flush_packed = remote.get_function("cpu_cache_flush")(args=begin=1)
   # fprepare is a callback that will be called before the evaluation, it takes in args as arguments. 
   f = mod.time_evaluator("myfunc", repeat=10, fprepare=cache_flush_packed)
   ```
   We will meet `Cannot pass type FunctionHandle as an argument to the remote` error. Do you have any good suggestion about it?
   
   Related Code (see `Wrong` and `Pass` part):
   ```c++
   TVM_REGISTER_GLOBAL("runtime.RPCTimeEvaluator")
       .set_body_typed([](Optional<Module> opt_mod, std::string name, int device_type, int device_id,
                          int number, int repeat, int min_repeat_ms, PackedFunc f_prepare) {
         TVMContext ctx;
         ctx.device_type = static_cast<DLDeviceType>(device_type);
         ctx.device_id = device_id;
         if (opt_mod.defined()) {
           Module m = opt_mod.value();
           std::string tkey = m->type_key();
           if (tkey == "rpc") {
             // Wrong
             return static_cast<RPCModuleNode*>(m.operator->())
                 ->GetTimeEvaluator(name, ctx, number, repeat, min_repeat_ms, f_prepare);
             // Pass
             ctx.device_type = static_cast<DLDeviceType>(ctx.device_type % kRPCSessMask);
             return WrapTimeEvaluator(m.GetFunction(name, false), ctx, number, repeat, min_repeat_ms, f_prepare);
           } else {
             return WrapTimeEvaluator(m.GetFunction(name, false), ctx, number, repeat, min_repeat_ms, f_prepare);
           }
         }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] FrozenGene edited a comment on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

FrozenGene edited a comment on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-650674951


   I was at vocation several days ago, sorry for replying a little late. For this, I also want to do a quick discussion. For single op / or Ansor's single subgraph benchmark, we don't want to turn on cache flush so that we could unleash maximum performance for them, however for network benchmark, we want to turn on cache flush because of the reason mentioned. So maybe A1  is easier to complete this task if everyone agree we turn off cache flush when we do single op benchmark?
   
   I agree we should provide one api for this as if we want to support remote devices (like arm), environment values can not control it on our local machine. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tqchen commented on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

tqchen commented on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-650205070


   @FrozenGene please followup


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy commented on a change in pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

merrymercy commented on a change in pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#discussion_r454630575



##########
File path: python/tvm/autotvm/measure/measure_methods.py
##########
@@ -180,12 +180,14 @@ class RPCRunner(Runner):
         Whether check correctness after measurement. This will use llvm cpu target to
         call your template and get the reference output.
         This can work for TOPI templates, but may not work for your custom template.
+    enable_cpu_cache_flush: bool
+        Whether to enable cpu cache flush, which only has effect on CPU task.

Review comment:
       ```suggestion
           Whether to flush cache on CPU between repeated measurements.
           Flushing cache can make the measured latency of one operator closer to 
           its actual latency during end-to-end inference.
           To make this option effective, the argument `number` should also be set to 1.
   ```

##########
File path: tutorials/autotvm/tune_relay_x86.py
##########
@@ -114,6 +114,17 @@ def get_network(name, batch_size):
 # We will use local mode for tuning configuration. RPC tracker
 # mode can be setup similarly to the approach in
 # :ref:`tune_relay_arm` tutorial.
+#
+# In the measure option, we turn on enable_cpu_cache_flush to
+# get more precise measurement. When we turn it on, we don't
+# need to set min_repeat_ms to dynamically adjust to run op
+# many times so that we get precise measurement as when we
+# have cache flush, we could get precise measurement even we
+# run serveral times. So, we could just set number be 1 and
+# repeat be 10 to run only 10 times. The reason we set number be 1
+# is we will turn on cache flush before every repeat run in
+# internal implementation. So if number is greater than 1, the
+# cache flush effect will be probably invalid.

Review comment:
       This paragraph explains the motivation of this PR. But it is very hard for new users to understand.
   Because this tutorial is for new users with few tvm experiences, I suggest the follwoing modification.
   

##########
File path: python/tvm/autotvm/measure/measure_methods.py
##########
@@ -309,7 +313,8 @@ class LocalRunner(RPCRunner):
         Whether check correctness after measurement. This will use llvm cpu target to
         call your template and get the reference output.
         This can work for TOPI templates, but may not work for your custom template.
-
+    enable_cpu_cache_flush: bool
+        Whether to enable cpu cache flush, which only has effect on CPU task.

Review comment:
       ```suggestion
           Whether to flush cache on CPU between repeated measurements.
           Flushing cache can make the measured latency of one operator closer to 
           its actual latency during end-to-end inference.
           To make this option effective, the argument `number` should also be set to 1.
   ```

##########
File path: tutorials/autotvm/tune_relay_x86.py
##########
@@ -114,6 +114,17 @@ def get_network(name, batch_size):
 # We will use local mode for tuning configuration. RPC tracker
 # mode can be setup similarly to the approach in
 # :ref:`tune_relay_arm` tutorial.
+#
+# In the measure option, we turn on enable_cpu_cache_flush to
+# get more precise measurement. When we turn it on, we don't
+# need to set min_repeat_ms to dynamically adjust to run op
+# many times so that we get precise measurement as when we
+# have cache flush, we could get precise measurement even we
+# run serveral times. So, we could just set number be 1 and
+# repeat be 10 to run only 10 times. The reason we set number be 1
+# is we will turn on cache flush before every repeat run in
+# internal implementation. So if number is greater than 1, the
+# cache flush effect will be probably invalid.

Review comment:
       ```suggestion
   # To perform a precise measurement, we should repeat the measurement several times and
   # use the average of results.  In addition, we need to flush the cache for the weight tensors
   # between repeated measurements. This can make the measured latency of one operator 
   # closer to its actual latency during end-to-end inference.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] FrozenGene commented on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

FrozenGene commented on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-657533357


   @yidawang @tqchen Code is updated. Please help to review it again. Currently, when we tune network, we could set `enable_cpu_cache_flush` option be true to turn on cache flush for network. The default value is false. Like to listen to your opinions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] FrozenGene commented on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

FrozenGene commented on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-650674951


   I was at vocation several days ago, sorry for replying a little late. For this, I also want to do a quick discussion. For single op / or Ansor's single subgraph benchmark, we don't want to turn on cache flush to unleash maximum performance, however for network benchmark, we want to do it because of the reason mentioned. So maybe A1  is easier to complete this task if everyone agree we turn off cache flush when we do single op benchmark?
   
   I agree we should provide one api for this as if we want to support remote devices (like arm), environment values can not control it on our local machine. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tqchen commented on pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

tqchen commented on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-651200012


   How about we go with A1 for now, @FrozenGene can you update this PR to A1?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] FrozenGene commented on a change in pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

FrozenGene commented on a change in pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#discussion_r454023755



##########
File path: python/tvm/autotvm/measure/measure_methods.py
##########
@@ -309,7 +313,8 @@ class LocalRunner(RPCRunner):
         Whether check correctness after measurement. This will use llvm cpu target to
         call your template and get the reference output.
         This can work for TOPI templates, but may not work for your custom template.
-
+    enable_cpu_cache_flush: bool
+        Whether to enable cpu cache flush

Review comment:
       Thanks, I will update the doc




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] FrozenGene commented on a change in pull request #5914: [clflush] Enable x86 cpu cache flush

Posted by GitBox <gi...@apache.org>.

FrozenGene commented on a change in pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#discussion_r454039357



##########
File path: python/tvm/autotvm/measure/measure_methods.py
##########
@@ -473,8 +482,10 @@ def run_through_rpc(measure_input, build_result,
         remote.upload(build_result.filename)
         func = remote.load_module(os.path.split(build_result.filename)[1])
         ctx = remote.context(str(measure_input.target), 0)
+        f_prepare = 'cache_flush_cpu_non_first_arg' if enable_cpu_cache_flush else ''

Review comment:
       I could add it in the code place in the measure_methods.py. Do you accept this place?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org