You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/04/29 11:47:00 UTC

[GitHub] [tvm] argrento opened a new pull request, #11180: Change of OpenCL profiling logic

argrento opened a new pull request, #11180:
URL: https://github.com/apache/tvm/pull/11180

   Profiling in TVM is enabled or disable in compile time by the `USE_PROFILER` switch. It means that if we enable profile in `config.cmake`, but do not use any profiling features in the app, OpenCL is forced to collect `cl_events` objects.
   
   Build TVM with `set(USE_PROFILER ON)`.
   Consider simple app, where we create module from the `.so` file:
   ```c
   tvm::runtime::Module mod_factory = tvm::runtime::Module::LoadFromFile("model.so");
   tvm::runtime::Module gmod = mod_factory.GetFunction("default")(ctx);
   tvm::runtime::PackedFunc set_input = gmod.GetFunction("set_input");
   tvm::runtime::PackedFunc get_input = gmod.GetFunction("get_input");
   tvm::runtime::PackedFunc get_output = gmod.GetFunction("get_output");
   tvm::runtime::PackedFunc run = gmod.GetFunction("run");
   
   // set inputs and outputs
   
   size_t niterations = 5000;
   for (size_t i = 0; i < niterations; i++) {
     run();
   }
   ```
   
   Then we collect memory usage info with Valgrind.
   ```
       MB
   818.5^                                                                       #
        |                                                                    @@@#
        |                                                                @@@@@@@#
        |                                                            @@@@@@@@@@@#
        |                                                         @@@@@@@@@@@@@@#
        |                                                     ::::@@@@@@@@@@@@@@#
        |                                                  @:::: :@@@@@@@@@@@@@@#
        |                                             @@@:@@:::: :@@@@@@@@@@@@@@#
        |                                           @@@ @:@@:::: :@@@@@@@@@@@@@@#
        |                                      :@@@@@@@ @:@@:::: :@@@@@@@@@@@@@@#
        |                                  :@@::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
        |                               ::::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
        |                           :@@@: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
        |                       :::@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
        |                   ::@@: :@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
        |                @@@: @@: :@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
        |           :::::@@ : @@: :@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
        | @@:::::::@:: ::@@ : @@: :@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
        | @ :::::::@:: ::@@ : @@: :@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
        | @ :::::::@:: ::@@ : @@: :@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
      0 +----------------------------------------------------------------------->Gi
        0                                                                   75.95
   ```
   
   We do not use any profiling info, but it is collected implicitly because of the compile-time switches:
   1. https://github.com/apache/tvm/blob/6babb89cbb9fc5ab718f8b996c7ce60bf5ebbefd/src/runtime/opencl/opencl_device_api.cc#L431
   2. https://github.com/apache/tvm/blob/6babb89cbb9fc5ab718f8b996c7ce60bf5ebbefd/src/runtime/opencl/opencl_module.cc#L84
   
   
   
   With the proposed modifications this behavior is changed: `clCommandQueue` by default is created in the normal mode and is recreated again with profiling capabilities when user calls profiler explicitly. When a profiling session is finished, the queue is recreated again in normal mode, which allows to mix `profile()` calls and `run()` calls.
   
   With the proposed changes valgrind shows no abnormal memory usage for the example above.
   
   ```
       MB
   148.9^#                                                                       
        |#::::::::::::::::::::::::::@:::::::::::::@:::@:::::@::::::@:::::@:::::@:
        |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
        |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
        |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
        |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
        |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
        |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
        |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
        |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
        |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
        |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
        |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
        |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
        |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
        |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
        |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
        |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
        |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
        |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
      0 +----------------------------------------------------------------------->Gi
        0                                                                   83.07
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] argrento commented on a diff in pull request #11180: [OpenCL] Change of OpenCL profiling logic

Posted by GitBox <gi...@apache.org>.

argrento commented on code in PR #11180:
URL: https://github.com/apache/tvm/pull/11180#discussion_r861807517


##########
src/runtime/opencl/opencl_common.h:
##########
@@ -422,23 +424,57 @@ class OpenCLTimerNode : public TimerNode {
   virtual void Start() {
     cl::OpenCLWorkspace::Global()->GetEventQueue(dev_).clear();
     this->duration = 0;
+
+    if (!cl::OpenCLWorkspace::Global()->profiling) {
+      // Very first call of Start() leads to the recreation of
+      // OpenCL command queue in profiling mode. This allows to run profile after inference.
+      OPENCL_CALL(clFlush(cl::OpenCLWorkspace::Global()->GetQueue(dev_)));
+      OPENCL_CALL(clFinish(cl::OpenCLWorkspace::Global()->GetQueue(dev_)));
+      OPENCL_CALL(clReleaseCommandQueue(cl::OpenCLWorkspace::Global()->GetQueue(dev_)));
+
+      cl_int err_code;
+      cl_device_id did = cl::OpenCLWorkspace::Global()->devices[dev_.device_id];
+      auto profiling_queue = clCreateCommandQueue(cl::OpenCLWorkspace::Global()->context, did,
+                                                  CL_QUEUE_PROFILING_ENABLE, &err_code);
+      OPENCL_CHECK_ERROR(err_code);
+      cl::OpenCLWorkspace::Global()->queues[dev_.device_id] = profiling_queue;
+      cl::OpenCLWorkspace::Global()->profiling = true;
+    }

Review Comment:
   I think it's a good idea.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] echuraev commented on a diff in pull request #11180: [OpenCL] Change of OpenCL profiling logic

Posted by GitBox <gi...@apache.org>.

echuraev commented on code in PR #11180:
URL: https://github.com/apache/tvm/pull/11180#discussion_r861824053


##########
src/runtime/opencl/opencl_common.h:
##########
@@ -229,6 +229,8 @@ class OpenCLWorkspace : public DeviceAPI {
   cl_context context{nullptr};
   // whether the workspace it initialized.
   bool initialized_{false};
+  // whether the workspace is in profiling mode.
+  bool profiling{false};

Review Comment:
   Just another idea. It is not critical. You can get the properties on the queue and check if the profiling was enabled or not. In this case, it is not necessary to introduce this variable.
   Note: you can request properties by using [clGetCommandQueueInfo](https://www.khronos.org/registry/OpenCL/sdk/1.1/docs/man/xhtml/clGetCommandQueueInfo.html)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] echuraev commented on a diff in pull request #11180: [OpenCL] Change of OpenCL profiling logic

Posted by GitBox <gi...@apache.org>.

echuraev commented on code in PR #11180:
URL: https://github.com/apache/tvm/pull/11180#discussion_r861824053


##########
src/runtime/opencl/opencl_common.h:
##########
@@ -229,6 +229,8 @@ class OpenCLWorkspace : public DeviceAPI {
   cl_context context{nullptr};
   // whether the workspace it initialized.
   bool initialized_{false};
+  // whether the workspace is in profiling mode.
+  bool profiling{false};

Review Comment:
   Just another idea for discuss. You can get the properties on the queue and check if the profiling was enabled or not. In this case, it is not necessary to introduce this variable.
   Note: you can request properties by using [clGetCommandQueueInfo](https://www.khronos.org/registry/OpenCL/sdk/1.1/docs/man/xhtml/clGetCommandQueueInfo.html)
   
   Also, in case if you have several queues for different devices. Using global variable won't be safe.
   Probably in this case it will be better to introduce method `isProfilingEnabled(device or queue)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] echuraev commented on a diff in pull request #11180: [OpenCL] Change of OpenCL profiling logic

Posted by GitBox <gi...@apache.org>.

echuraev commented on code in PR #11180:
URL: https://github.com/apache/tvm/pull/11180#discussion_r861824053


##########
src/runtime/opencl/opencl_common.h:
##########
@@ -229,6 +229,8 @@ class OpenCLWorkspace : public DeviceAPI {
   cl_context context{nullptr};
   // whether the workspace it initialized.
   bool initialized_{false};
+  // whether the workspace is in profiling mode.
+  bool profiling{false};

Review Comment:
   Just another idea for discuss. You can get the properties on the queue and check if the profiling was enabled or not. In this case, it is not necessary to introduce this variable.
   Note: you can request properties by using [clGetCommandQueueInfo](https://www.khronos.org/registry/OpenCL/sdk/1.1/docs/man/xhtml/clGetCommandQueueInfo.html)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] echuraev commented on a diff in pull request #11180: [OpenCL] Change of OpenCL profiling logic

Posted by GitBox <gi...@apache.org>.

echuraev commented on code in PR #11180:
URL: https://github.com/apache/tvm/pull/11180#discussion_r861760594


##########
src/runtime/opencl/opencl_common.h:
##########
@@ -422,23 +424,57 @@ class OpenCLTimerNode : public TimerNode {
   virtual void Start() {
     cl::OpenCLWorkspace::Global()->GetEventQueue(dev_).clear();
     this->duration = 0;
+
+    if (!cl::OpenCLWorkspace::Global()->profiling) {
+      // Very first call of Start() leads to the recreation of
+      // OpenCL command queue in profiling mode. This allows to run profile after inference.
+      OPENCL_CALL(clFlush(cl::OpenCLWorkspace::Global()->GetQueue(dev_)));
+      OPENCL_CALL(clFinish(cl::OpenCLWorkspace::Global()->GetQueue(dev_)));
+      OPENCL_CALL(clReleaseCommandQueue(cl::OpenCLWorkspace::Global()->GetQueue(dev_)));
+
+      cl_int err_code;
+      cl_device_id did = cl::OpenCLWorkspace::Global()->devices[dev_.device_id];
+      auto profiling_queue = clCreateCommandQueue(cl::OpenCLWorkspace::Global()->context, did,
+                                                  CL_QUEUE_PROFILING_ENABLE, &err_code);
+      OPENCL_CHECK_ERROR(err_code);
+      cl::OpenCLWorkspace::Global()->queues[dev_.device_id] = profiling_queue;
+      cl::OpenCLWorkspace::Global()->profiling = true;
+    }

Review Comment:
   This code is almost the same as in the destructor. Maybe we could create a private method e.g. `switchProfilingMode()` or `enableProfiling(bool)` and move this logic into this method?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi merged pull request #11180: [OpenCL] Change of OpenCL profiling logic

Posted by GitBox <gi...@apache.org>.

masahi merged PR #11180:
URL: https://github.com/apache/tvm/pull/11180


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] echuraev commented on pull request #11180: [OpenCL] Change of OpenCL profiling logic

Posted by GitBox <gi...@apache.org>.

echuraev commented on PR #11180:
URL: https://github.com/apache/tvm/pull/11180#issuecomment-1122597316

   @masahi could you please take a look at this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] echuraev commented on a diff in pull request #11180: [WIP] [OpenCL] Change of OpenCL profiling logic

Posted by GitBox <gi...@apache.org>.

echuraev commented on code in PR #11180:
URL: https://github.com/apache/tvm/pull/11180#discussion_r866715240


##########
src/runtime/opencl/opencl_common.h:
##########
@@ -459,20 +461,10 @@ class OpenCLTimerNode : public TimerNode {
   virtual int64_t SyncAndGetElapsedNanos() { return this->duration; }
   // destructor
   virtual ~OpenCLTimerNode() {
-    if (cl::OpenCLWorkspace::Global()->profiling) {
+    if (cl::OpenCLWorkspace::Global()->IsProfiling(dev_)) {
       // Profiling session ends, recreate clCommandQueue in non-profiling mode
       // This will disable collection of cl_events in case of executing inference after profile
-      OPENCL_CALL(clFlush(cl::OpenCLWorkspace::Global()->GetQueue(dev_)));
-      OPENCL_CALL(clFinish(cl::OpenCLWorkspace::Global()->GetQueue(dev_)));
-      OPENCL_CALL(clReleaseCommandQueue(cl::OpenCLWorkspace::Global()->GetQueue(dev_)));
-
-      cl_int err_code;
-      cl_device_id did = cl::OpenCLWorkspace::Global()->devices[dev_.device_id];
-      auto normal_queue =
-          clCreateCommandQueue(cl::OpenCLWorkspace::Global()->context, did, 0, &err_code);
-      OPENCL_CHECK_ERROR(err_code);
-      cl::OpenCLWorkspace::Global()->queues[dev_.device_id] = normal_queue;
-      cl::OpenCLWorkspace::Global()->profiling = false;
+      recreateCommandQueue(false);

Review Comment:
   It's up to you, just my opinion that this method name with boolean argument might confuse the developers, and it will be necessary to read the implementation of this method.
   I would suggest moving the if condition from line 464 into this method and rename it to `enableProfiling(bool)`. Internally, we will detect the current state of the queue and recreate command queue when it is necessary.



##########
src/runtime/opencl/opencl_common.h:
##########
@@ -276,6 +274,20 @@ class OpenCLWorkspace : public DeviceAPI {
         << "Invalid OpenCL device_id=" << dev.device_id;
     return events[dev.device_id];
   }
+  // is current clCommandQueue in profiling mode
+  bool IsProfiling(Device dev) {
+    ICHECK(IsOpenCLDevice(dev));
+    this->Init();
+    ICHECK(dev.device_id >= 0 && static_cast<size_t>(dev.device_id) < queues.size())
+        << "Invalid OpenCL device_id=" << dev.device_id;
+    cl_command_queue queue = queues[dev.device_id];

Review Comment:
   Use `GetQueue` method.



##########
src/runtime/opencl/opencl_common.h:
##########
@@ -485,6 +477,26 @@ class OpenCLTimerNode : public TimerNode {
  private:
   int64_t duration;
   Device dev_;
+
+  void recreateCommandQueue(bool profiling) {
+    cl_command_queue_properties prop;
+    if (profiling) {
+      prop = CL_QUEUE_PROFILING_ENABLE;
+    } else {
+      prop = 0;
+    }
+
+    OPENCL_CALL(clFlush(cl::OpenCLWorkspace::Global()->GetQueue(dev_)));
+    OPENCL_CALL(clFinish(cl::OpenCLWorkspace::Global()->GetQueue(dev_)));
+    OPENCL_CALL(clReleaseCommandQueue(cl::OpenCLWorkspace::Global()->GetQueue(dev_)));

Review Comment:
   nit: You call `cl::OpenCLWorkspace::Global()->GetQueue(dev_)` three times. Probably creating local variable for queue will do this code more readable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] srkreddy1238 commented on pull request #11180: [OpenCL] Change of OpenCL profiling logic

Posted by GitBox <gi...@apache.org>.

srkreddy1238 commented on PR #11180:
URL: https://github.com/apache/tvm/pull/11180#issuecomment-1233742441

   @argrento 
   
   This PR causes CLML profiling failure. Reason explained here
   
   In general the workspaces can be accessed and shared via “device_api.opencl”. CLML integration shares the workspace created by default OpenCL and it has a reference to the command queue.
   
   Why do we enable profiler in compilation when we don't want to profile any thing ? 
   Or 
   Is there any case where we enable profiling some thing else but not OpenCL  / dynamically enable/disable profiling? In such cases we could think of having different compilation flag for OpenCL or environment varible to control at runtime launch.
   
   At any case dynamically recreating the queue will cause issues for other components.
   
   @masahi & @valmat07 comment pls.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org