You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/04/19 00:00:07 UTC

[GitHub] [tvm] apeskov opened a new pull request #7877: Async measurer API for auto-scheduler scripts

apeskov opened a new pull request #7877:
URL: https://github.com/apache/tvm/pull/7877


   Introduce callback based async API for Builder/Runner. It allows to build async pipelines for kernel evaluation. Totally it speeds up measure process by overlapping build and run stages.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] comaniac commented on pull request #7877: Async measurer API for auto-scheduler scripts

Posted by GitBox <gi...@apache.org>.

comaniac commented on pull request #7877:
URL: https://github.com/apache/tvm/pull/7877#issuecomment-822633881


   Thanks for the PR and explanation. Meanwhile, could you elaborate your use case that could be benefit from this async process? AFAIK, we currently build N schedules in parallel on host with multi-threading, and measure their runtime on device sequentially. Since the build time of each schedule is similar, the bottleneck is definitiely the runtime measurement. In this case when you have only one device, async build and run doesn't help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] FrozenGene commented on a change in pull request #7877: Async measurer API for auto-scheduler scripts

Posted by GitBox <gi...@apache.org>.

FrozenGene commented on a change in pull request #7877:
URL: https://github.com/apache/tvm/pull/7877#discussion_r615512179



##########
File path: include/tvm/auto_scheduler/measure.h
##########
@@ -340,9 +351,14 @@ class LocalBuilderNode : public ProgramBuilderNode {
  public:
   /*! \brief Build function. */
   String build_func;
+  /*! \brief Functor with python implementation of submit method. */
+  PackedFunc submit_func;

Review comment:
       Could this work well on the remote or this only happens on the host side?

##########
File path: include/tvm/auto_scheduler/measure.h
##########
@@ -444,10 +469,9 @@ class RPCRunner : public ProgramRunner {
    * \param cooldown_interval The cool down interval between two measurements.
    * \param enable_cpu_cache_flush Whether to flush cache on CPU between repeated measurements.
    */
-  RPCRunner(const String& key, const String& host, int port, int priority, int n_parallel,
-            int timeout, int number, int repeat, int min_repeat_ms, double cooldown_interval,
-            bool enable_cpu_cache_flush);
-
+  RPCRunner(PackedFunc submit_func, const String& key, const String& host, int port, int priority,

Review comment:
       As previous comment, I doubt `PackedFunc` can not work well on the remote. Like `cache flush`, we use `string` to pass. See: https://github.com/apache/tvm/blob/main/python/tvm/auto_scheduler/measure.py#L849
   
   Let us add test case to cover.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] apeskov commented on a change in pull request #7877: Async measurer API for auto-scheduler scripts

Posted by GitBox <gi...@apache.org>.

apeskov commented on a change in pull request #7877:
URL: https://github.com/apache/tvm/pull/7877#discussion_r615830910



##########
File path: include/tvm/auto_scheduler/measure.h
##########
@@ -340,9 +351,14 @@ class LocalBuilderNode : public ProgramBuilderNode {
  public:
   /*! \brief Build function. */
   String build_func;
+  /*! \brief Functor with python implementation of submit method. */
+  PackedFunc submit_func;

Review comment:
       No, it will not. This function is only for host side. Will never transferred to execute on remote device.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jcf94 commented on a change in pull request #7877: Async measurer API for auto-scheduler scripts

Posted by GitBox <gi...@apache.org>.

jcf94 commented on a change in pull request #7877:
URL: https://github.com/apache/tvm/pull/7877#discussion_r615504135



##########
File path: include/tvm/auto_scheduler/measure.h
##########
@@ -340,9 +351,14 @@ class LocalBuilderNode : public ProgramBuilderNode {
  public:
   /*! \brief Build function. */
   String build_func;
+  /*! \brief Functor with python implementation of submit method. */
+  PackedFunc submit_func;

Review comment:
       How about to pass a wrapped build function as the build_func like:
   ```python
   wrapped_build_func(original_build_func, ...) {
       ....
       pass the original_build_func to your submit logic
       ....
   }
   ```
   Then we can avoid to add another strange API here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] apeskov commented on pull request #7877: Async measurer API for auto-scheduler scripts

Posted by GitBox <gi...@apache.org>.

apeskov commented on pull request #7877:
URL: https://github.com/apache/tvm/pull/7877#issuecomment-822442102


   Looks like I have to provide some explanation of what was done and what was a reason for some design decisions.
   
   Previous state of Runner/Builder implementation was next. Native C++ implementation of both classes are using python static methods "auto_scheduler.local_builder.build" and "auto_scheduler.local_runner.run" respectively. Both python functions are blocking, that is wait until all task in internal ThreadPool are finished. There are no state shared between runs, all treads started at the begin of python build/run functions will be joined at the end. The native implementation of builder/runner doesn't retain any python resources.
   
   **The problem 1**: Because of blocking nature of this API, "Run" step will starts only after "Build" step completion. Host compiles kernels - device awaits, device executes kernels - host waits. Device and host are separate devices and may works in parallel. Build and Run steps should be overlapped. 
   
   **The problem 2**: By design, the tuning round tasks are splitter into batches. At the end of batch processing it has strong sync point/barrier, wait until all task in batch are completed (limitation of sync nature of API). In case of tasks time imbalance all workers spend time in idle state.    
   
   **The goal**: To make Build and Run methods asynchronous (non blocking). The Builder/Runner instance incapsulate some fixed pool of workers and each Build/Run method just submit task to execute without waiting of completion. The synchronisation happens only once at the end of tuning round method. 
   
   **My suggestion for design changes**:
   * Make methods Build and Run async with callback mechanics. The completion callback will be called when one atomic task is completed. 
   ```cpp
   Build(...)  -> SubmitToBuild(... , callback);
   Run(...) -> SubmitToRun(... , callback);
   
   // Usage example before
   for (auto input : input_collection)
     build_result_collection.push_back(builder->Build(input));
   for (auto build_result : build_result_collection)
     run_result_collection.pushback(runner->Run(build_result));
   
   // Usage example after
   for (auto input : input_collection) {
     builder->SubmitToBuild(input, [](auto build_result) {
       runner->SubmitToRun(build_result, [](auto run_result) {
         run_result_collection.pushback(run_result);
       });
     });
     cv.wait(run_result_collection.size() == input_collection.size())
   }
   ```
   Worker pool is attached for builder/runner object and reused for all SubmitToBuild/SubmitToRun calls.
   
   * Make workers ThreadPool as Builder/Runner resources. One builder instance uses single instance of ThreadPool. Release of Builder object will lead of release of worker ThreadPool. There is a no a convenient way to attach regular python object to native C++ implementation of FFI based objects. The only way I found in TVM sources is next:
     - Create python object
     - Capture this object into some python function
     - Wrap this function into PackedFunction
     - Hold this PackedFunction instance as a field in native class 
   
     That's why I added `PackedFunc submit_func` field instead of simple string with function name. It should hold worker TreadPool object. If you know a more proper way to do that please advise me.
   
   * Synchronisation and task dependency solving are implemented on `ProgramMeasurer` level. It uses SubmitToBuild/SubmitToRun in a way it wants. It may simulate blocking behaviour like before. May add additional barriers. Or just submit all existing task to executors as a single scope and wait it completion. The special flag `async_mode` manages sync(old)/async behaviour of ProgramMeasurer.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] apeskov commented on a change in pull request #7877: Async measurer API for auto-scheduler scripts

Posted by GitBox <gi...@apache.org>.

apeskov commented on a change in pull request #7877:
URL: https://github.com/apache/tvm/pull/7877#discussion_r615833856



##########
File path: include/tvm/auto_scheduler/measure.h
##########
@@ -444,10 +469,9 @@ class RPCRunner : public ProgramRunner {
    * \param cooldown_interval The cool down interval between two measurements.
    * \param enable_cpu_cache_flush Whether to flush cache on CPU between repeated measurements.
    */
-  RPCRunner(const String& key, const String& host, int port, int priority, int n_parallel,
-            int timeout, int number, int repeat, int min_repeat_ms, double cooldown_interval,
-            bool enable_cpu_cache_flush);
-
+  RPCRunner(PackedFunc submit_func, const String& key, const String& host, int port, int priority,

Review comment:
       This function is designed only for on host usage. PackedFunction is used here just to hold python object on native c++ level. There are no scenarios when this function will be transferred to device.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] apeskov commented on a change in pull request #7877: Async measurer API for auto-scheduler scripts

Posted by GitBox <gi...@apache.org>.

apeskov commented on a change in pull request #7877:
URL: https://github.com/apache/tvm/pull/7877#discussion_r615828975



##########
File path: include/tvm/auto_scheduler/measure.h
##########
@@ -340,9 +351,14 @@ class LocalBuilderNode : public ProgramBuilderNode {
  public:
   /*! \brief Build function. */
   String build_func;
+  /*! \brief Functor with python implementation of submit method. */
+  PackedFunc submit_func;

Review comment:
       The main reason of using PackedFunc here instead of string with function name is next. I have to hold a ThreadPool object which will execute submitted tasks. Worker ThreadPool is not a singleton, it's an essential part of Builder/Runner instance and cannot be covered by `@tvm._ffi.register_func` mechanics.
   
   In other words `submit_func` is not just static function, that's a holder for python based resources objects and methods.
   
   I've tried to explain a little bit more detailed in [post](https://github.com/apache/tvm/pull/7877#issuecomment-822442102)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org