You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/10/12 19:45:20 UTC

[GitHub] [incubator-tvm] tkonolige opened a new pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

tkonolige opened a new pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671


   This PR fixes autotvm and the autoscheduler when using multiprocessing's spawn start method. I had to remove all nested function declarations, property serialize auto_scheduler's `SearchTask`, and propagate registries to subprocesses. Unfortunately, the registry propagation does not work correctly when running under pytest. I've disable those tests for now (`tests/python/unittest/test_runtime_rpc.py`), but they work if run directly. Maybe someone else can take a look at this.
   
   See #6650 
   
   @merrymercy 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

tkonolige commented on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-708649133


   The problem with using fork is that it is unsafe on macOS or when using the CUDA api. The pytorch developers recommend not using fork with CUDA: https://pytorch.org/docs/master/notes/multiprocessing.html#cuda-in-multiprocessing. It looks like we have to choose something that works over something that is fast.
   
   I also don't think it really matters that spawning is slower than forking. We are not spawning that many processes. And if the overhead from spawning is big enough to cause a significant slowdown, then we shouldn't have been using processes/threads in the first place.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy commented on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy commented on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717506384


   I tested your autotvm implementation and found it is significantly slower.
   
   To do the test, you can run this tutorial (https://github.com/apache/incubator-tvm/blob/main/tutorials/autotvm/tune_conv2d_cuda.py) and replace the n_trial=20 with n_trial=100
   
   The number we care about is the time spent on simulated annealing which actually uses the xgboost_cost_model.py modified by you.
   
   This is the screenshot before your PR. You can see elapsed: 63.40 in the last line, which means it takes 63.40 ms to do simulated annealing.
   ![image](https://user-images.githubusercontent.com/15100009/97355858-c9639c00-1854-11eb-8c45-17e7d8768985.png)
   
   However, with your PR, the number becomes 194.67
   
   <img width="1152" alt="Screen Shot 2020-10-27 at 12 47 15 PM" src="https://user-images.githubusercontent.com/15100009/97355886-cff21380-1854-11eb-9058-72eb1ac9f844.png">
   
   I don’t understand why the autotvm part is required for that PR, because autotvm and auto-scheduler are totally independent.  Can you delete autotvm files in that PR?
   
   The test machine is a 24-core Intel E5


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-716905146


   The auto-scheduler part looks good to me. I believe this PR solves the problem while not bringing any performance regression.
   But I am not sure about the autotvm part. Your test case is not correct. With `n-trials=16`, the code modified by you will not even be executed. Because the cost model only starts to work after getting enough training samples (n-trials > 64). To do the benchmark correctly, you should use a larger matrix size (n=1024), run the autotvm with n-trials > 64 and `XGBTuner`, and report the time used in simulated annealing (i.e., the part modified by you).
   Could you delete the autotvm part in this PR so we can merge this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

tkonolige edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717560438


   I was thinking it was cloud pickle reloading the main module, but I took off the `__name__ == "__main__"` lines and it performs the same. `__name__ == "__main__"` is still required for non-linux machines (i.e. the tutorials are buggy on non linux machines), but it isn't causing the issue here. The only difference I can see then is that your machine has so many threads. Can you reproduce your results on a different machine?
   
   Also, did you remove the tuning logs between runs?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-708845921


   Autotvm uses a lot of python multiprocessing and I expect it will be much slower when using spawn.  AutoTVM uses multiprocessing for feature extraction. So it needs to launch about 50,000 tasks every measurement batch.
   
   The situation for Ansor is better as Ansor does not rely on multiprocessing as heavily as AutoTVM does.
   However, I'd like to see benchmark numbers about fork vs. spawn.
   i.e. Run a search with `n-trials=64` with (Ansor, AutoTVM) x (fork, spawn) X (CPU, GPU) (Optional).
   The requirement from my side is that I don't want to see any performance regression.
   I prefer to only use the slower and safer fallback mechanism when the fast approach does not work.
   
   Removing all multiprocessing requires huge engineering effort.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

tkonolige commented on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717392440


   Here are the autotvm times you requested. 1024x1024 with 100 trials.
   ```
   This PR
   autotvm       llvm 297.038928
   autotvm       cuda 15.138037
   
   main
   autotvm       llvm 297.375458
   autotvm       cuda 14.163432
   ```
   
   Results are very similar.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-708845921


   Autotvm uses a lot of python multiprocessing and I expect it will be much slower when using spawn.  AutoTVM uses multiprocessing for feature extraction. So it needs to launch about 50,000 tasks every measurement batch.
   
   The situation for Ansor is better as Ansor does not rely on multiprocessing as heavily as AutoTVM does.
   However, I'd like to see benchmark numbers about fork vs. spawn.
   i.e. Run a search with `n_trials=64` with (Ansor, AutoTVM) x (fork, spawn).
   The requirement from my side is that I don't want to see any performance regression.
   I prefer to only use the slower and safer fallback mechanism when the fast approach does not work.
   
   Removing all multiprocessing requires huge engineering effort.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

tkonolige commented on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717369426


   @merrymercy The autotvm changes are required for things to work.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-716905146


   The auto-scheduler part looks good to me. I believe this PR solves the problem while not bringing any performance regression.
   But I am not sure about the autotvm part. Your test case is not correct. With `n-trials=16`, the code modified by you will not even be executed. Because the cost model only starts to work after getting enough training data (n-trials > 64). To do the benchmark correctly, you should use a larger matrix size, run the autotvm with n-trials > 64 and `XGBTuner`, and report the time used in simulated annealing (i.e., the part modified by you).
   Could you delete the autotvm part in this PR so we can merge this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tqchen edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

tqchen edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717620222


   the sphinx gallery will run all tutorials together, and it would still be useful to keep the tutorials in the form of ipynb style(no main)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717588164


   I added `__name__ == "__main__"` but the results are the same.
   
   Can you delete all autotvm related code and tutorials in this PR? We can investigate later. In the meanwhile, can you try on a machine with more cores?
   Can you check whether your modifications to the tutorial break its format? 
   I don't think the tutorial scripts can generate correct rst files after adding indentation to the text block.
   You can build the tutorials locally and check the generated HTML files.
   
   This PR and #6710 are conflicting. I want to merge this first so I can do merge and clean up on my PR. Otherwise, you have to do more work by yourself.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

tkonolige commented on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-718075660


   Maybe we should at least leave a comment that they are broken when using the spawn method?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

tkonolige edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717560438


   I was thinking it was cloud pickle reloading the main module, but I took off the `__name__ == "__main__"` lines and it performs the same. `__name__ == "__main__"` is still required for non-linux machines (i.e. the tutorials are buggy on non linux machines), but it isn't causing the issue here. The only difference I can see then is that your machine has so many threads. Can you reproduce your results on a different machine?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tqchen commented on a change in pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

tqchen commented on a change in pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#discussion_r511931651



##########
File path: python/tvm/autotvm/task/task.py
##########
@@ -173,6 +172,8 @@ def __getstate__(self):
         # some unpickable local task functions.
         # So we only pickle the name of the function
         # and restore the function by name when unpickling it.
+        import cloudpickle  # pylint: disable=import-outside-toplevel

Review comment:
       in this case I asked @tkonolige to import here for now since cloudpickle is an optional dep introduced by autotuning atm,




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717506384


   I tested your autotvm implementation and found it is significantly slower.
   
   To do the test, you can run this tutorial (https://github.com/apache/incubator-tvm/blob/main/tutorials/autotvm/tune_conv2d_cuda.py) and replace the n_trial=20 with n_trial=100
   
   The number we care about is the time spent on simulated annealing which actually uses the xgboost_cost_model.py modified by you.
   
   This is the screenshot before your PR. You can see `elapsed: 63.40` in the last line, which means it takes 63.40 ms to do simulated annealing.
   ![image](https://user-images.githubusercontent.com/15100009/97355858-c9639c00-1854-11eb-8c45-17e7d8768985.png)
   
   However, with your PR, the number becomes 194.67
   
   <img width="1152" alt="Screen Shot 2020-10-27 at 12 47 15 PM" src="https://user-images.githubusercontent.com/15100009/97355886-cff21380-1854-11eb-9058-72eb1ac9f844.png">
   
   I don’t understand why the autotvm part is required for that PR, because autotvm and auto-scheduler are totally independent.  Can you delete autotvm files in that PR?
   
   The test machine is a 24-core Intel E5


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

tkonolige commented on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-714820720


   @merrymercy @tqchen A review on this would be great.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy commented on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy commented on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-716905146


   The auto-scheduler part looks good to me.
   But I am also not sure about the autotvm part.
   If you don't mind, I suggest deleting the autotvm part in this PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717588164


   I added `__name__ == "__main__"` but the results are the same.
   
   Can you delete all autotvm related code and tutorials in this PR? We can investigate later. In the meanwhile, can you try on a machine with more cores?
   Can you check whether your modifications to the tutorial break its format?
   I don't think the tutorials can generate correct rst files after adding indentation to the text block.
   
   This PR and #6710 are conflicting. I want to merge this first so I can do merge and clean up on my PR. Otherwise, you have to do more work by yourself.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy commented on a change in pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy commented on a change in pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#discussion_r511923166



##########
File path: python/tvm/auto_scheduler/measure.py
##########
@@ -590,14 +634,20 @@ def local_builder_build(inputs, timeout, n_parallel, build_func="default", verbo
     res : List[BuildResult]
         The build results of these MeasureInputs.
     """
-    # We use fork and a global variable to copy arguments between processes.
-    # This can avoid expensive serialization of TVM IR when using multiprocessing.Pool
-    global GLOBAL_BUILD_ARGUMENTS
-
-    GLOBAL_BUILD_ARGUMENTS = (inputs, build_func, timeout, verbose)
-
-    pool = NoDaemonPool(n_parallel)
-    tuple_res = pool.map(local_build_worker, range(len(inputs)))
+    # This pool is not doing computationally intensive work, so we can use threads
+    pool = multiprocessing.pool.ThreadPool(n_parallel)

Review comment:
       Do we still need serialization with `ThreadPool`?
   I guess all these arguments will be shared in memory and will be passed as references.
   
   If this is true, then we don't need to change any other part of the code in this file.
   In addition, we can choose to use `ProcessingPool` and `ThreadPool` here according to the OS.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-716534539


   Another concern is that I introduced a new kind of workload in this PR (#6710, https://github.com/apache/incubator-tvm/blob/5d93c61896acafe5d0b76b70615f2e2823cbf3b2/python/tvm/auto_scheduler/workload_registry.py#L158) for relay integration.
   When using auto-scheduler in Relay, we don't serialize a ComputeDAG by function + arguments.
   We save a copy of the TE expressions generated by relay as the workload.
   So we have to support serializing TE expressions after this PR. But I am not sure whether it works because there are pointers in TE expressions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy commented on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy commented on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-716534539


   Another concern is that I introduced a new kind of workload in this PR (#6710) for relay integration.
   When using auto-scheduler in Relay, we don't serialize a ComputeDAG by function + arguments.
   We save a copy of the TE expressions generated by relay as the workload.
   So we have to support serializing TE expressions after this PR. I am not sure if it works because there are pointers in TE expressions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-708845921


   Autotvm uses a lot of python multiprocessing and I expect it will be much slower when using spawn.  AutoTVM uses multiprocessing for feature extraction. So it needs to launch about 50,000 tasks every measurement batch.
   
   Ansor does not rely on multiprocessing too much. So the situation for Ansor is better.
   However, I'd like to see benchmark numbers about fork vs. spawn.
   i.e. Run a search with `n-trials=64` with (Ansor, AutoTVM) x (fork, spawn) X (CPU, GPU) (Optional)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-708845921


   Autotvm uses a lot of python multiprocessing and I expect it will be much slower when using spawn.  AutoTVM uses multiprocessing for feature extraction. So it needs to launch about 50,000 tasks every measurement batch.
   
   Ansor does not rely on multiprocessing too much. So the situation for Ansor is better.
   However, I'd like to see benchmark numbers about fork vs. spawn.
   i.e. Run a search with `n-trials=64` with (Ansor, AutoTVM) x (fork, spawn) X (CPU, GPU) (Optional).
   The requirement from my side is that I don't want to see any performance regression.
   I prefer to only use the slower and safer fallback mechanism when the fast approach does not work.
   
   Removing all multiprocessing requires huge engineering effort.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

tkonolige commented on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-713161790


   @merrymercy spawn is definitely slower. With autotvm, spawn is about 50% slower. I figured out where the CUDA drivers were being called in forked threads and fixed it. Now fork should be used on linux.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-708845921


   Autotvm uses a lot of python multiprocessing and I expect it will be much slower when using spawn.  AutoTVM uses multiprocessing for feature extraction. So it needs to launch about 50,000 tasks every measurement batch.
   
   The situation for Ansor is better as Ansor does not rely on multiprocessing as heavily as AutoTVM does.
   However, I'd like to see benchmark numbers about fork vs. spawn.
   i.e. Run a search with `n_trials=64` with (Ansor, AutoTVM) x (fork, spawn).
   The requirement from my side is that I don't want to see any performance regression.
   I prefer to only use the slower and safer fallback mechanism when the fast approach does not work.
   
   On the other hand, removing all multiprocessing requires some engineering effort. But this is not on my agenda for now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-716534539


   Another concern is that I introduced a new kind of workload in this PR (#6710, https://github.com/apache/incubator-tvm/blob/5d93c61896acafe5d0b76b70615f2e2823cbf3b2/python/tvm/auto_scheduler/workload_registry.py#L158) for relay integration.
   When using auto-scheduler in Relay, we don't serialize a ComputeDAG by function + arguments.
   We save a copy of the TE expressions generated by relay as the workload.
   So we have to support serializing TE expressions after this PR. But I am not sure if it works because there are pointers in TE expressions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy commented on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy commented on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717540915


   What's your testing environment?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717550912


   Sorry, could you point me to the related lines? I only see you rename some variables in `python/tvm/autotvm/measure/local_executor.py`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717540915


   What's your testing environment (python version, CPU, OS, ...)?
   Why is the tutorial broken? I can run it successfully.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-708845921


   Autotvm uses a lot of python multiprocessing and I expect it will be much slower when using spawn.  AutoTVM uses multiprocessing for feature extraction. So it needs to launch about 50,000 tasks every measurement batch.
   
   The situation for Ansor is better as Ansor does not rely on multiprocessing as heavily as AutoTVM does.
   However, I'd like to see benchmark numbers about fork vs. spawn.
   i.e. Run a search with `n_trials=64` with (Ansor, AutoTVM) x (fork, spawn).
   The requirement from my side is that I don't want to see any performance regression.
   I prefer to only use the slower and safer fallback mechanism when the fast approach does not work.
   
   On the other size, removing all multiprocessing requires some engineering effort. But this is not on my agenda for now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy commented on a change in pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy commented on a change in pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#discussion_r511899520



##########
File path: python/tvm/auto_scheduler/measure.py
##########
@@ -51,19 +51,52 @@
 from .loop_state import StateObject
 from .utils import (
     get_const_tuple,
-    NoDaemonPool,
     call_func_with_timeout,
     request_remote,
     check_remote,
 )
+from .compute_dag import ComputeDAG
+from .search_task import SearchTask
+from .workload_registry import workload_name, get_workload
 
 # The maximum length of error message
 MAX_ERROR_MSG_LEN = 512
 
-# We use fork and a global variable to copy arguments between processes.
-# This can avoid expensive serialization of TVM IR when using multiprocessing.Pool
-GLOBAL_BUILD_ARGUMENTS = None
-GLOBAL_RUN_ARGUMENTS = None
+
+def recover_measure_input(inp, rebuild_state=False):
+    """
+    Recover a deserialized MeasureInput by rebuilding the missing fields.
+    1. Rebuid the compute_dag in inp.task
+    2. (Optional) Rebuild the stages in inp.state
+
+    Parameters
+    ----------
+    inp: MeasureInput
+        The deserialized MeasureInput
+    rebuild_state: bool = False
+        Whether rebuild the stages in MeasureInput.State
+
+    Returns
+    -------
+    new_input: MeasureInput
+        The fully recovered MeasureInput with all fields rebuilt.
+    """
+    task = inp.task
+    print(task.hardware_params)

Review comment:
       delete this

##########
File path: python/tvm/auto_scheduler/measure.py
##########
@@ -950,26 +1017,30 @@ def rpc_runner_run(
     res : List[MeasureResult]
         The measure results of these MeasureInputs.
     """
-    global GLOBAL_RUN_ARGUMENTS
-    GLOBAL_RUN_ARGUMENTS = (
-        inputs,
-        build_results,
-        key,
-        host,
-        port,
-        priority,
-        timeout,
-        number,
-        repeat,
-        min_repeat_ms,
-        cooldown_interval,
-        enable_cpu_cache_flush,
-        verbose,
-    )
-
     assert len(inputs) == len(build_results), "Measure input size should be equal to build results"
-    pool = NoDaemonPool(n_parallel)
-    tuple_res = pool.map(rpc_run_worker, range(len(build_results)))
+    # This pool is not doing computationally intensive work, so we can use threads

Review comment:
       Did you benchmark the speed of ProcessingPool vs. ThreadPool?
   For the comment, is it "not doing computational intensive work" or "not doing computational intensive work in python"?

##########
File path: python/tvm/auto_scheduler/utils.py
##########
@@ -169,17 +143,18 @@ def kill_child_processes(parent_pid, sig=signal.SIGTERM):
             return
 
 
+def _func_wrapper(que, func, args, kwargs):

Review comment:
       document it?

##########
File path: python/tvm/auto_scheduler/measure.py
##########
@@ -590,14 +634,20 @@ def local_builder_build(inputs, timeout, n_parallel, build_func="default", verbo
     res : List[BuildResult]
         The build results of these MeasureInputs.
     """
-    # We use fork and a global variable to copy arguments between processes.
-    # This can avoid expensive serialization of TVM IR when using multiprocessing.Pool
-    global GLOBAL_BUILD_ARGUMENTS
-
-    GLOBAL_BUILD_ARGUMENTS = (inputs, build_func, timeout, verbose)
-
-    pool = NoDaemonPool(n_parallel)
-    tuple_res = pool.map(local_build_worker, range(len(inputs)))
+    # This pool is not doing computationally intensive work, so we can use threads
+    pool = multiprocessing.pool.ThreadPool(n_parallel)

Review comment:
       Do we still need serialization with `ThreadPool`?
   I guess all these arguments will be shared in memory and will be passed as references.

##########
File path: python/tvm/auto_scheduler/measure.py
##########
@@ -87,6 +120,24 @@ def __init__(self, task, state):
         state = state if isinstance(state, StateObject) else state.state_object
         self.__init_handle_by_constructor__(_ffi_api.MeasureInput, task, state)
 
+    def serialize(self):

Review comment:
       Can we use `__getstate__`?

##########
File path: python/tvm/autotvm/task/task.py
##########
@@ -173,6 +172,8 @@ def __getstate__(self):
         # some unpickable local task functions.
         # So we only pickle the name of the function
         # and restore the function by name when unpickling it.
+        import cloudpickle  # pylint: disable=import-outside-toplevel

Review comment:
       Why not import on the top-level?

##########
File path: python/tvm/autotvm/tuner/xgboost_cost_model.py
##########
@@ -321,10 +316,11 @@ def _get_feature(self, indexes):
 
         indexes = np.array(indexes)
         need_extract = [x for x in indexes if x not in fea_cache]
+        args = [(self.space.get(x), self.target, self.task) for x in need_extract]

Review comment:
       Doing this still needs serializing a lot of things. Did you test the performance before vs. after?

##########
File path: src/auto_scheduler/measure_record.cc
##########
@@ -107,18 +107,68 @@ struct Handler<::tvm::auto_scheduler::StateNode> {
   }
 };
 
+template <>
+struct Handler<::tvm::auto_scheduler::HardwareParamsNode> {
+  inline static void Write(dmlc::JSONWriter* writer,
+                           const ::tvm::auto_scheduler::HardwareParamsNode& data) {
+    writer->BeginArray(false);
+    writer->WriteArrayItem(data.num_cores);
+    writer->WriteArrayItem(data.vector_unit_bytes);
+    writer->WriteArrayItem(data.cache_line_bytes);
+    writer->WriteArrayItem(data.max_shared_memory_per_block);
+    writer->WriteArrayItem(data.max_registers_per_block);
+    writer->WriteArrayItem(data.max_threads_per_block);
+    writer->WriteArrayItem(data.max_vthread_extent);
+    writer->WriteArrayItem(data.warp_size);
+    writer->EndArray();
+  }
+  inline static void Read(dmlc::JSONReader* reader,
+                          ::tvm::auto_scheduler::HardwareParamsNode* data) {
+    bool s;
+    reader->BeginArray();
+    s = reader->NextArrayItem();
+    CHECK(s);
+    reader->Read(&data->num_cores);
+    s = reader->NextArrayItem();
+    CHECK(s);
+    reader->Read(&data->vector_unit_bytes);
+    s = reader->NextArrayItem();
+    CHECK(s);
+    reader->Read(&data->cache_line_bytes);
+    s = reader->NextArrayItem();
+    CHECK(s);
+    reader->Read(&data->max_shared_memory_per_block);
+    s = reader->NextArrayItem();
+    CHECK(s);
+    reader->Read(&data->max_registers_per_block);
+    s = reader->NextArrayItem();
+    CHECK(s);
+    reader->Read(&data->max_threads_per_block);
+    s = reader->NextArrayItem();
+    CHECK(s);
+    reader->Read(&data->max_vthread_extent);
+    s = reader->NextArrayItem();
+    CHECK(s);
+    reader->Read(&data->warp_size);
+    s = reader->NextArrayItem();
+    CHECK(!s);
+  }
+};
+
 template <>
 struct Handler<::tvm::auto_scheduler::SearchTaskNode> {
   inline static void Write(dmlc::JSONWriter* writer,
                            const ::tvm::auto_scheduler::SearchTaskNode& data) {
     writer->BeginArray(false);
     writer->WriteArrayItem(std::string(data.workload_key));
     writer->WriteArrayItem(data.target->str());
+    writer->WriteArrayItem(*data.hardware_params.get());

Review comment:
       Since you changed the format, please update the version number to `v0.3`
   https://github.com/apache/incubator-tvm/blob/c6f18250e176a0f107481d489651f6abdfe00976/src/auto_scheduler/measure_record.cc#L219




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717506384


   I tested your autotvm implementation and found it is significantly slower.
   
   To do the test, you can run this tutorial (https://github.com/apache/incubator-tvm/blob/main/tutorials/autotvm/tune_conv2d_cuda.py) and replace the n_trial=20 with n_trial=100
   
   The number we care about is the time spent on simulated annealing which actually uses the xgboost_cost_model.py modified by you.
   
   This is the screenshot before your PR. You can see `elapsed: 63.40` in the last line, which means it takes 63.40 seconds to do simulated annealing.
   ![image](https://user-images.githubusercontent.com/15100009/97355858-c9639c00-1854-11eb-8c45-17e7d8768985.png)
   
   However, with your PR, the number becomes 194.67
   
   <img width="1152" alt="Screen Shot 2020-10-27 at 12 47 15 PM" src="https://user-images.githubusercontent.com/15100009/97355886-cff21380-1854-11eb-9058-72eb1ac9f844.png">
   
   I don’t understand why the autotvm part is required for that PR, because autotvm and auto-scheduler are totally independent.  Can you delete autotvm files in that PR?
   
   The test machine is a 24-core Intel E5


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717540915


   What's your testing environment?
   Why is the tutorial broken? I can run it successfully.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on a change in pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

tkonolige commented on a change in pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#discussion_r512090033



##########
File path: python/tvm/auto_scheduler/measure.py
##########
@@ -87,6 +120,24 @@ def __init__(self, task, state):
         state = state if isinstance(state, StateObject) else state.state_object
         self.__init_handle_by_constructor__(_ffi_api.MeasureInput, task, state)
 
+    def serialize(self):

Review comment:
       There was some weird initialization bug with `__getstate__` that I could not figure out. I'll add a comment about this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

tkonolige commented on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717536336


   @merrymercy The goal of this pr is to make autotuning and auto scheduling work on macOS and windows. Currently autotvm does not work on macOS, so that is why I have been touching it.
   
   The tutorial you are using is broken. All code that needs to be run should be within an `if __name__ == "__main__":` statement. With this statement, and changing n_trials to 100, I get these results:
   
   On main
   ```
   SA iter: 50	last_update: 49	max-0: 5.05	max-1: 5.32	temp: 0.90	elapsed: 22.72
   SA iter: 100	last_update: 98	max-0: 5.19	max-1: 5.43	temp: 0.80	elapsed: 44.25
   SA iter: 150	last_update: 147	max-0: 5.19	max-1: 5.43	temp: 0.70	elapsed: 65.84
   SA iter: 200	last_update: 198	max-0: 5.29	max-1: 5.43	temp: 0.60	elapsed: 87.38
   SA iter: 250	last_update: 248	max-0: 5.29	max-1: 5.43	temp: 0.50	elapsed: 108.47
   SA iter: 300	last_update: 299	max-0: 5.31	max-1: 5.43	temp: 0.40	elapsed: 129.17
   SA iter: 350	last_update: 331	max-0: 5.32	max-1: 5.43	temp: 0.30	elapsed: 149.26
   SA iter: 400	last_update: 396	max-0: 5.32	max-1: 5.43	temp: 0.20	elapsed: 168.95
   SA iter: 450	last_update: 448	max-0: 5.32	max-1: 5.43	temp: 0.10	elapsed: 187.60
   SA iter: 500	last_update: 499	max-0: 5.32	max-1: 5.43	temp: 0.00	elapsed: 204.33
   SA iter: 500	last_update: 499	elapsed: 204.33
   ```
   
   On this PR
   ```
   SA iter: 50	last_update: 49	max-0: 6.39	max-1: 6.87	temp: 0.90	elapsed: 23.53
   SA iter: 100	last_update: 97	max-0: 6.56	max-1: 7.05	temp: 0.80	elapsed: 46.57
   SA iter: 150	last_update: 149	max-0: 6.68	max-1: 7.05	temp: 0.70	elapsed: 69.27
   SA iter: 200	last_update: 192	max-0: 6.69	max-1: 7.05	temp: 0.60	elapsed: 91.90
   SA iter: 250	last_update: 238	max-0: 6.71	max-1: 7.05	temp: 0.50	elapsed: 114.53
   SA iter: 300	last_update: 294	max-0: 6.77	max-1: 7.05	temp: 0.40	elapsed: 136.67
   SA iter: 350	last_update: 348	max-0: 6.77	max-1: 7.05	temp: 0.30	elapsed: 158.47
   SA iter: 400	last_update: 398	max-0: 6.87	max-1: 7.05	temp: 0.20	elapsed: 179.36
   SA iter: 450	last_update: 447	max-0: 6.87	max-1: 7.05	temp: 0.10	elapsed: 198.81
   SA iter: 500	last_update: 497	max-0: 6.87	max-1: 7.05	temp: 0.00	elapsed: 216.06
   SA iter: 500	last_update: 497	elapsed: 216.06
   ```
   
   Which has the runtime pretty close (I'm not sure what the variance on times is here).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

tkonolige commented on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717548491


   They are broken with respect to the multiprocessing changes I have introduced in this PR. Technically speaking they were never correct as the script to be executed should be wrapped in `__name__ == "__main__"`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy commented on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy commented on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-716516782


   Thanks for the refactoring! I would like to see benchmark results for both autotvm and auto-scheduler, so we don't get performance regression.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-716534539


   Another concern is that I introduced a new kind of workload in this PR (#6710) for relay integration.
   When using auto-scheduler in Relay, we don't serialize a ComputeDAG by function + arguments.
   We save a copy of the TE expressions generated by relay as the workload.
   So we have to support serializing TE expressions after this PR. But I am not sure if it works because there are pointers in TE expressions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717588164


   I added `__name__ == "__main__"` but the results are the same.
   
   Can you delete all autotvm related code and tutorials in this PR? We can investigate later. In the meanwhile, can you try on a machine with more cores?
   Can you check whether your modifications to the tutorials break their format? 
   I don't think the tutorial scripts can generate correct RST files after adding indentation to the text block.
   You can build the tutorials locally and check the generated HTML files.
   
   This PR and #6710 are conflicting. I want to merge this first so I can do merge and clean up on my PR. Otherwise, you have to do more work by yourself.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

tkonolige commented on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717560438


   I took off the `__name__ == "__main__"` lines and it performs the same. `__name__ == "__main__"` is still required for non-linux machines (i.e. the tutorials are buggy on non linux machines), but it isn't causing the issue here. The only difference I can see then is that your machine has so many threads. Can you reproduce your results on a different machine?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717550912






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-716905146


   The auto-scheduler part looks good to me. I believe this PR solves the problem while not bringing any performance regression.
   But I am not sure about the autotvm part. Your test case is not correct. With `n-trials=16`, the code modified by you will not even be executed. Because the cost model only starts to work after getting enough training samples (n-trials > 64). To do the benchmark correctly, you should use a larger matrix size (n=1024), run the autotvm with n-trials > 64 and `XGBTuner`, and report the time used in simulated annealing (which actually runs the modified `xgboost_cost_model.py`).
   Could you delete the autotvm part in this PR (files : `python/tvm/autotvm/task/task.py`, `python/tvm/autotvm/tuner/xgboost_cost_model.py`) so we can merge this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717619470


   How about also undoing the tutorials? I guess their format will be wrong because you add indentation to text blocks.
   For the autotvm problem, I am okay to accept the following solution:
   provide two versions of the code and pick the version of code according to OS.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717619470


   How about also undoing the tutorials? I guess their format will be wrong because you add indentation to text blocks.
   For the autotvm problem, I am okay to accept the following solution:
   1. provide two versions of the code.
   2. pick the version of code according to OS.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

tkonolige commented on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717601781


   Here is the tests I've run on a 64-core machine:
   
   ```
   This PR
   XGB iter:   0	tr-a-recall@64: 0.598716	tr-map: 0.142857
   XGB iter:  25	tr-a-recall@64: 0.667324	tr-map: 0.500000
   XGB stopped. Best iteration: [13] tr-a-recall@64:0.67690	tr-map:0.50000
   XGB train: 1.24	obs: 64	error: 50	n_cache: 64
   SA iter: 50	last_update: 49	max-0: 4.42	max-1: 4.71	temp: 0.90	elapsed: 16.14
   SA iter: 100	last_update: 97	max-0: 4.55	max-1: 4.71	temp: 0.80	elapsed: 30.59
   SA iter: 150	last_update: 145	max-0: 4.56	max-1: 4.74	temp: 0.70	elapsed: 45.74
   SA iter: 200	last_update: 197	max-0: 4.58	max-1: 4.83	temp: 0.60	elapsed: 60.61
   SA iter: 250	last_update: 249	max-0: 4.67	max-1: 4.84	temp: 0.50	elapsed: 76.09
   SA iter: 300	last_update: 299	max-0: 4.69	max-1: 4.84	temp: 0.40	elapsed: 90.07
   SA iter: 350	last_update: 349	max-0: 4.71	max-1: 4.84	temp: 0.30	elapsed: 104.27
   SA iter: 400	last_update: 377	max-0: 4.71	max-1: 4.84	temp: 0.20	elapsed: 118.12
   SA iter: 450	last_update: 448	max-0: 4.72	max-1: 4.84	temp: 0.10	elapsed: 130.86
   SA iter: 500	last_update: 498	max-0: 4.76	max-1: 4.84	temp: 0.00	elapsed: 143.39
   SA iter: 500	last_update: 498	elapsed: 143.39
   
   main
   XGB iter:   0	tr-a-recall@64: 0.641594	tr-map: 0.500000
   XGB iter:  25	tr-a-recall@64: 0.716034	tr-map: 0.500000
   XGB iter:  50	tr-a-recall@64: 0.728245	tr-map: 1.000000
   XGB stopped. Best iteration: [38] tr-a-recall@64:0.73020	tr-map:0.50000
   XGB train: 1.10	obs: 64	error: 46	n_cache: 64
   SA iter: 50	last_update: 49	max-0: 6.21	max-1: 6.94	temp: 0.90	elapsed: 3.40
   SA iter: 100	last_update: 99	max-0: 6.50	max-1: 7.16	temp: 0.80	elapsed: 6.68
   SA iter: 150	last_update: 149	max-0: 6.74	max-1: 7.38	temp: 0.70	elapsed: 10.24
   SA iter: 200	last_update: 199	max-0: 6.86	max-1: 7.39	temp: 0.60	elapsed: 13.91
   SA iter: 250	last_update: 249	max-0: 6.97	max-1: 7.39	temp: 0.50	elapsed: 17.68
   SA iter: 300	last_update: 294	max-0: 7.07	max-1: 7.56	temp: 0.40	elapsed: 21.29
   SA iter: 350	last_update: 348	max-0: 7.11	max-1: 7.56	temp: 0.30	elapsed: 24.52
   SA iter: 400	last_update: 399	max-0: 7.15	max-1: 7.56	temp: 0.20	elapsed: 27.68
   SA iter: 450	last_update: 449	max-0: 7.16	max-1: 7.56	temp: 0.10	elapsed: 30.63
   SA iter: 500	last_update: 499	max-0: 7.17	max-1: 7.56	temp: 0.00	elapsed: 33.48
   SA iter: 500	last_update: 499	elapsed: 33.48
   ```
   Seems like the issue is hit when there are a lot of cores.
   
   I've pulled out the autotvm parts and will send another pr when I figure out those.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy commented on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy commented on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-708845921


   Autotvm uses a lot of python multiprocessing and I expect it will be much slower when using spawn.  AutoTVM uses multiprocessing for feature extraction. So it needs to launch about 50,000 processes every measurement batch.
   
   Ansor does not rely on multiprocessing too much. So the situation for Ansor is better.
   However, I'd like to see benchmark numbers about fork vs. spawn.
   i.e. Run a search with `n-trials=64` with (Ansor, AutoTVM) x (fork, spawn) X (CPU, GPU) (Optional)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-716905146


   The auto-scheduler part looks good to me. I believe this PR solves the problem while not bringing any performance regression.
   But I am not sure about the autotvm part. Your test case is not correct. With `n-trials=16`, the code modified by you will not even be executed. Because the cost model only starts to work after getting enough training samples (n-trials > 64). To do the benchmark correctly, you should use a larger matrix size (n=1024), run the autotvm with n-trials > 64 and `XGBTuner`, and report the time used in simulated annealing (which actually uses the modified cost model)).
   Could you delete the autotvm part in this PR so we can merge this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-716905146


   The auto-scheduler part looks good to me. I believe this PR solves the problem while not bringing any performance regression.
   But I am not sure about the autotvm part. Could you delete the autotvm part in this PR so we can merge this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717588164


   I added `__name__ == "__main__"` but the results are the same.
   
   Can you delete all autotvm related code and tutorials in this PR? We can investigate later. In the meanwhile, can you try on a machine with more cores?
   Can you check whether your modifications to the tutorial break its format? 
   I don't think the tutorial scripts can generate correct rst files after adding indentation to the text block.
   You can build the tutorials locally and check the built HTML files.
   
   This PR and #6710 are conflicting. I want to merge this first so I can do merge and clean up on my PR. Otherwise, you have to do more work by yourself.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717547728


   Which PR introduced the changes that broke the tutorials? I don't find related changes in your PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-716905146


   The auto-scheduler part looks good to me. I believe this PR solves the problem while not bringing any performance regression.
   But I am not sure about the autotvm part. Your test case is not correct. With `n-trials=16`, the code modified by you will not even be executed.
   Could you delete the autotvm part in this PR so we can merge this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy commented on a change in pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy commented on a change in pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#discussion_r511923166



##########
File path: python/tvm/auto_scheduler/measure.py
##########
@@ -590,14 +634,20 @@ def local_builder_build(inputs, timeout, n_parallel, build_func="default", verbo
     res : List[BuildResult]
         The build results of these MeasureInputs.
     """
-    # We use fork and a global variable to copy arguments between processes.
-    # This can avoid expensive serialization of TVM IR when using multiprocessing.Pool
-    global GLOBAL_BUILD_ARGUMENTS
-
-    GLOBAL_BUILD_ARGUMENTS = (inputs, build_func, timeout, verbose)
-
-    pool = NoDaemonPool(n_parallel)
-    tuple_res = pool.map(local_build_worker, range(len(inputs)))
+    # This pool is not doing computationally intensive work, so we can use threads
+    pool = multiprocessing.pool.ThreadPool(n_parallel)

Review comment:
       Do we still need serialization with `ThreadPool`?
   I guess all these arguments will be shared in memory and will be passed as references.
   
   If this is true, then we don't need to change any other part of the code in this file.
   In addition, we can choose to use `ProcessingPool` and `ThreadPool` here according to the OS.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on a change in pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

tkonolige commented on a change in pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#discussion_r512097091



##########
File path: python/tvm/auto_scheduler/measure.py
##########
@@ -950,26 +1017,30 @@ def rpc_runner_run(
     res : List[MeasureResult]
         The measure results of these MeasureInputs.
     """
-    global GLOBAL_RUN_ARGUMENTS
-    GLOBAL_RUN_ARGUMENTS = (
-        inputs,
-        build_results,
-        key,
-        host,
-        port,
-        priority,
-        timeout,
-        number,
-        repeat,
-        min_repeat_ms,
-        cooldown_interval,
-        enable_cpu_cache_flush,
-        verbose,
-    )
-
     assert len(inputs) == len(build_results), "Measure input size should be equal to build results"
-    pool = NoDaemonPool(n_parallel)
-    tuple_res = pool.map(rpc_run_worker, range(len(build_results)))
+    # This pool is not doing computationally intensive work, so we can use threads

Review comment:
       The pool is doing basically no work. Each thread in the pool spawns a process and then waits till it times out.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-716905146


   The auto-scheduler part looks good to me. I believe this PR solves the problem while not bringing any performance regression.
   But I am not sure about the autotvm part. Your test case is not correct. With `n-trials=16`, the code modified by you will not even be executed. Because the cost model only starts to work after getting enough training samples (n-trials > 64). To do the benchmark correctly, you should use a larger matrix size (n=1024), run the autotvm with n-trials > 64 and `XGBTuner`, and report the time used in simulated annealing (which actually uses the modified cost model)).
   Could you delete the autotvm part in this PR (`python/tvm/autotvm/task/task.py`, `python/tvm/autotvm/tuner/xgboost_cost_model.py`) so we can merge this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tqchen commented on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

tqchen commented on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-715919654


   cc @jcf94 @comaniac @yzhliu please also help to take a look


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

tkonolige commented on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717543079


   If you don't have the `if __name__ == "__main__":`, then the the whole body of the module may be run on every sub process. Could you test on your machine with this change?
   
   My environment is archlinux with a Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz and 1080. Python 3.8.6.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy merged pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy merged pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy commented on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy commented on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717547728


   Which PR introduced the changes that broke the tutorials? I don't find related thing in your PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy commented on a change in pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy commented on a change in pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#discussion_r511903881



##########
File path: python/tvm/auto_scheduler/measure.py
##########
@@ -950,26 +1017,30 @@ def rpc_runner_run(
     res : List[MeasureResult]
         The measure results of these MeasureInputs.
     """
-    global GLOBAL_RUN_ARGUMENTS
-    GLOBAL_RUN_ARGUMENTS = (
-        inputs,
-        build_results,
-        key,
-        host,
-        port,
-        priority,
-        timeout,
-        number,
-        repeat,
-        min_repeat_ms,
-        cooldown_interval,
-        enable_cpu_cache_flush,
-        verbose,
-    )
-
     assert len(inputs) == len(build_results), "Measure input size should be equal to build results"
-    pool = NoDaemonPool(n_parallel)
-    tuple_res = pool.map(rpc_run_worker, range(len(build_results)))
+    # This pool is not doing computationally intensive work, so we can use threads

Review comment:
       Did you benchmark the speed of ProcessingPool vs. ThreadPool?
   For the comment, is this pool "not doing computational intensive work" or "not doing computational intensive work in python"?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy commented on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy commented on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717619470


   How about also undoing the tutorials? I guess their format will be wrong because you add indentation to text blocks.
   For the autotvm problem, I am okay to accept the following solution:
   1. provide two versions of the code.
   2. pick the version according to OS.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-708845921


   Autotvm uses a lot of python multiprocessing and I expect it will be much slower when using spawn.  AutoTVM uses multiprocessing for feature extraction. So it needs to launch about 50,000 tasks every measurement batch.
   
   The situation for Ansor is better as Ansor does not rely on multiprocessing as heavily as AutoTVM does.
   However, I'd like to see benchmark numbers about fork vs. spawn.
   i.e. Run a search with `n_trials=64` with (Ansor, AutoTVM) x (fork, spawn) X (CPU, GPU) (Optional).
   The requirement from my side is that I don't want to see any performance regression.
   I prefer to only use the slower and safer fallback mechanism when the fast approach does not work.
   
   Removing all multiprocessing requires huge engineering effort.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy commented on a change in pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy commented on a change in pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#discussion_r511923166



##########
File path: python/tvm/auto_scheduler/measure.py
##########
@@ -590,14 +634,20 @@ def local_builder_build(inputs, timeout, n_parallel, build_func="default", verbo
     res : List[BuildResult]
         The build results of these MeasureInputs.
     """
-    # We use fork and a global variable to copy arguments between processes.
-    # This can avoid expensive serialization of TVM IR when using multiprocessing.Pool
-    global GLOBAL_BUILD_ARGUMENTS
-
-    GLOBAL_BUILD_ARGUMENTS = (inputs, build_func, timeout, verbose)
-
-    pool = NoDaemonPool(n_parallel)
-    tuple_res = pool.map(local_build_worker, range(len(inputs)))
+    # This pool is not doing computationally intensive work, so we can use threads
+    pool = multiprocessing.pool.ThreadPool(n_parallel)

Review comment:
       Do we still need serialization with `ThreadPool`?
   I guess all these arguments will be shared in memory and will be passed as references.
   
   If this is true, then we don't need to change any other part of the code in this file.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tqchen commented on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

tqchen commented on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717620222


   I agree that the sphinx gallery will run all tutorials together, and it would still be useful to keep the tutorials in the form of ipynb style(no main)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-716905146


   The auto-scheduler part looks good to me. I believe this PR solves the problem while not bringing any performance regression.
   But I am not sure about the autotvm part. Your test case is not correct. With `n-trials=16`, the code modified by you will not be executed.
   Could you delete the autotvm part in this PR so we can merge this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-716905146


   The auto-scheduler part looks good to me.
   But I am not sure about the autotvm part.
   If you don't mind, I suggest deleting the autotvm part in this PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy commented on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy commented on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717550912


   Sorry, could you point me to the related lines? I only see you rename some variables in the `python/tvm/autotvm/measure/local_executor.py`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717506384






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy commented on a change in pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy commented on a change in pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#discussion_r511940229



##########
File path: python/tvm/auto_scheduler/workload_registry.py
##########
@@ -175,6 +175,41 @@ def workload_key_to_tensors(workload_key):
     return lookup(*args)
 
 
+def get_workload(task):
+    """Get the workload function for a given task
+
+    Parameters
+    ----------
+    task : SearchTask
+        Task to get workload of.
+
+    Returns
+    -------
+    workload : callable
+        The registered workload function.
+    """
+    name = workload_name(task.workload_key)
+    lookup = WORKLOAD_FUNC_REGISTRY[name]
+    assert callable(lookup)
+    return lookup
+
+
+def workload_name(workload_key):

Review comment:
       ```suggestion
   def workload_func_name(workload_key):
   ```

##########
File path: python/tvm/auto_scheduler/workload_registry.py
##########
@@ -175,6 +175,41 @@ def workload_key_to_tensors(workload_key):
     return lookup(*args)
 
 
+def get_workload(task):

Review comment:
       ```suggestion
   def get_workload_func(task):
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

tkonolige commented on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-716780203


   Here are benchmark results on linux using forking. This is `dense` with 64 x 64 matrices. ntrials=16, 5 bencmarking iterations.
   
   | Branch | Auto-what | Target | Mean Time | STD |
   | ------ | --------- | ------ | --------- | --- |
   | main | autotvm        | llvm  | 20.429478 | 1.914881 |
   | PR | autotvm       | llvm | 26.400752 | 9.013005 |
   | main | autotvm        | cuda  |  7.010600 | 0.080864 |
   | PR | autotvm       | cuda |  7.001349 | 0.017357 |
   | main | autoscheduler  | llvm  | 16.711073 | 2.572820 |
   | PR | autoscheduler | llvm | 14.752735 | 0.757895 |
   | main | autoscheduler  | cuda  | 32.981713 | 0.072185 |
   | PR | autoscheduler | cuda | 32.283536 | 0.054209 |
   
   Looks pretty similar except for high standard deviation in runtime. Not sure what's causing this. I still think we should merge this as it fixes users not being able to run auto scheduling on Macs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-716905146


   The auto-scheduler part looks good to me. I believe this PR won't bring any performance regression.
   But I am not sure about the autotvm part. Could you delete the autotvm part in this PR so we can merge this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy commented on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy commented on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-717588164


   I added `__name__ == "__main__"` but the results are the same.
   
   Can you delete all autotvm related code and tutorials in this PR? We can investigate later. In the meanwhile, can you try on a machine with more cores?
   
   This PR and #6710 are conflicting. I want to merge this first so I can do merge and clean up on my PR. Otherwise, you have to do more work by yourself.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-716905146


   The auto-scheduler part looks good to me. I believe this PR solves the problem while not bringing any performance regression.
   But I am not sure about the autotvm part. Your test case is not correct. With `n-trials=16`, the code modified by you will not even be executed. Because the cost model only starts to work after getting enough training samples (n-trials > 64). To do the benchmark correctly, you should use a larger matrix size, run the autotvm with n-trials > 64 and `XGBTuner`, and report the time used in simulated annealing (i.e., the part modified by you).
   Could you delete the autotvm part in this PR so we can merge this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-708103769






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-716905146






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy commented on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy commented on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-708103769


   Note that "spawn" is slower than "fork" (https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods).
   We should not change all default settings to "spawn".
   We should prefer "fork" as the default setting when it is possible.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6671: [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing's spawn start method

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6671:
URL: https://github.com/apache/incubator-tvm/pull/6671#issuecomment-716905146


   The auto-scheduler part looks good to me. I believe this PR solves the problem while not bringing any performance regression.
   But I am not sure about the autotvm part. Your test case is not correct. With `n-trials=16`, the code modified by you will not even be executed. Because the cost model only starts to work after getting enough training data (n-trials > 64). To do the benchmark correctly, you should use a larger matrix size, run the autotvm with n-trials > 64, and report the time used in simulated annealing.
   Could you delete the autotvm part in this PR so we can merge this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org