You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by "psrivas2 (via GitHub)" <gi...@apache.org> on 2023/04/03 14:11:00 UTC

[GitHub] [tvm] psrivas2 opened a new pull request, #14465: [Unity][BYOC] Faster cutlass codegen

psrivas2 opened a new pull request, #14465:
URL: https://github.com/apache/tvm/pull/14465

   This PR improves cutlass compilation time, by compiling a single CSourceModule instead of creating and compiling one for each kernel.
   
   Creating and compiling a new CSourceModule for every function is quite slow and slows down model with multiple functions offloaded to cutlass quite significantly. Instead we can generate a single CSourceModule and compile it once to produce a single `runtime::Module`.
   This brings down the cutlass compilation time of large models like SD Unet significantly (~30 min to ~4 min). Similar results on other large models.
   
   #### Testing
   `tests/python/relax/test_codegen_cutlass.py::test_matmul_offload` is broken at HEAD. This PR passes on all other tests when tested locally.
   
   cc @masahi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] vinx13 merged pull request #14465: [Unity][BYOC] Faster cutlass codegen

Posted by "vinx13 (via GitHub)" <gi...@apache.org>.

vinx13 merged PR #14465:
URL: https://github.com/apache/tvm/pull/14465


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] psrivas2 commented on a diff in pull request #14465: [Unity][BYOC] Faster cutlass codegen

Posted by "psrivas2 (via GitHub)" <gi...@apache.org>.

psrivas2 commented on code in PR #14465:
URL: https://github.com/apache/tvm/pull/14465#discussion_r1156430949


##########
src/relax/backend/contrib/cutlass/codegen.cc:
##########
@@ -219,11 +219,16 @@ class CodegenCutlass : public relax::MemoizedExprTranslator<OutputType>,
 
 class CutlassModuleCodegen {
  public:
-  runtime::Module CreateCSourceModule(Function f, const Map<String, ObjectRef>& options) {
+  runtime::Module CreateCSourceModule(Array<Function> functions,
+                                      const Map<String, ObjectRef>& options) {
     std::string headers = "";
-    auto [code, op_headers] = GenCutlassFunc(f, options);
-    for (const auto& header : op_headers) {
-      headers += "#include <" + header + ">\n";
+    std::string code = "";
+    for (const auto& f : functions) {
+      auto [f_code, op_headers] = GenCutlassFunc(f, options);
+      code += "\n" + f_code;
+      for (const auto& header : op_headers) {
+        headers += "#include <" + header + ">\n";

Review Comment:
   Yes, however since this is a generated file, I felt it is ok to have duplicate entries in header. We can improve upon it in follow up PRs though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tvm-bot commented on pull request #14465: [Unity][BYOC] Faster cutlass codegen

Posted by "tvm-bot (via GitHub)" <gi...@apache.org>.

tvm-bot commented on PR #14465:
URL: https://github.com/apache/tvm/pull/14465#issuecomment-1494399612

   <!---bot-comment-->
   
   Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from [Reviewers](https://github.com/apache/incubator-tvm/blob/master/CONTRIBUTORS.md#reviewers) by @-ing them in a comment.
   
   <!--bot-comment-ccs-start-->
    * cc @billishyahao, @quic-sanirudh <sub>See [#10317](https://github.com/apache/tvm/issues/10317) for details</sub><!--bot-comment-ccs-end-->
   
   <sub>Generated by [tvm-bot](https://github.com/apache/tvm/blob/main/ci/README.md#github-actions)</sub>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] psrivas2 commented on pull request #14465: [Unity][BYOC] Faster cutlass codegen

Posted by "psrivas2 (via GitHub)" <gi...@apache.org>.

psrivas2 commented on PR #14465:
URL: https://github.com/apache/tvm/pull/14465#issuecomment-1494910975

   > The original intention was to compile all generated files in parallel (via NVCC `-t` flag), but I forgot to actually do it. Have you tested that? I expect that would be faster than this solution.
   
   Could you elaborate what `-t` flag would do and how would we use it? Loop [here](https://github.com/apache/tvm/pull/14465/files#diff-9b184ba90f566eaeb8c34e4032b221378e91b04b382f44a6b74786ca09537044L258-L265) processes annotated functions sequentially, so we will still have to parallelize that I think.
   
   I did parallelize this [loop](https://github.com/apache/tvm/pull/14465/files#diff-9b184ba90f566eaeb8c34e4032b221378e91b04b382f44a6b74786ca09537044L258-L265) to compile the generated C source modules in parallel but that wasn't faster than compiling a single file. The difference between the two was not huge but compiling a single source module was a bit faster (~50 seconds for single source mod vs ~70 seconds for multiple C source mod in parallel).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #14465: [Unity][BYOC] Faster cutlass codegen

Posted by "masahi (via GitHub)" <gi...@apache.org>.

masahi commented on PR #14465:
URL: https://github.com/apache/tvm/pull/14465#issuecomment-1494922361

   `-t` flag is the number of threads to use for NVCC, https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#threads-number-t. This is used by the Relay BYOC to compile all files in parallel.  
   
   https://github.com/apache/tvm/blob/5562d906f91c702be09392d282f5b4462a78f9fa/python/tvm/contrib/cutlass/build.py#L75
   
   I don't expect NVCC would use multiple threads to compile a huge single source, but the numbers you described sound indeed good.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #14465: [Unity][BYOC] Faster cutlass codegen

Posted by "masahi (via GitHub)" <gi...@apache.org>.

masahi commented on PR #14465:
URL: https://github.com/apache/tvm/pull/14465#issuecomment-1494927124

   Actually, since `compile_cutlass_module` is also used by the Relax BYOC, I think we are already making use of `-t` flag. And putting all sources into a single source module is the right solution to really benefit from multi threaded compilation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on a diff in pull request #14465: [Unity][BYOC] Faster cutlass codegen

Posted by "masahi (via GitHub)" <gi...@apache.org>.

masahi commented on code in PR #14465:
URL: https://github.com/apache/tvm/pull/14465#discussion_r1156423599


##########
src/relax/backend/contrib/cutlass/codegen.cc:
##########
@@ -219,11 +219,16 @@ class CodegenCutlass : public relax::MemoizedExprTranslator<OutputType>,
 
 class CutlassModuleCodegen {
  public:
-  runtime::Module CreateCSourceModule(Function f, const Map<String, ObjectRef>& options) {
+  runtime::Module CreateCSourceModule(Array<Function> functions,
+                                      const Map<String, ObjectRef>& options) {
     std::string headers = "";
-    auto [code, op_headers] = GenCutlassFunc(f, options);
-    for (const auto& header : op_headers) {
-      headers += "#include <" + header + ">\n";
+    std::string code = "";
+    for (const auto& f : functions) {
+      auto [f_code, op_headers] = GenCutlassFunc(f, options);
+      code += "\n" + f_code;
+      for (const auto& header : op_headers) {
+        headers += "#include <" + header + ">\n";

Review Comment:
   Here we might be adding duplicated headers. It probably won't matter for compilation speed but the generated file might get ugly. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #14465: [Unity][BYOC] Faster cutlass codegen

Posted by "masahi (via GitHub)" <gi...@apache.org>.

masahi commented on PR #14465:
URL: https://github.com/apache/tvm/pull/14465#issuecomment-1494833971

   The original intention was to compile all generated files in parallel (via NVCC `-t` flag), but I forgot to actually do it. Have you tested that? I expect that would be faster than this solution. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org