You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/12/03 00:07:07 UTC

[GitHub] [tvm] kevinthesun opened a new pull request #7018: [Topi] Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi

kevinthesun opened a new pull request #7018:
URL: https://github.com/apache/tvm/pull/7018


   This fix also works for gpu argwhere.
   
   @zhiics @anijain2305 @mbrookhart @Laurawly 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] kevinthesun commented on a change in pull request #7018: [Topi] Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi

Posted by GitBox <gi...@apache.org>.

kevinthesun commented on a change in pull request #7018:
URL: https://github.com/apache/tvm/pull/7018#discussion_r535663839



##########
File path: python/tvm/topi/cuda/sort.py
##########
@@ -561,10 +561,11 @@ def topk_thrust(data, k=1, axis=-1, ret_type="both", is_ascend=False, dtype="int
         tag="topk_gpu",
     )
 
-    if k > 0:
+    if not isinstance(k, int) or k > 0:
         beg = [0] * ndim
-        end = data.shape[:-1] + [k]
-        out = [strided_slice(o, beg, end) for o in out]
+        end = data.shape[:-1] + [k if isinstance(k, int) else tvm.te.size_var("dim")]
+        strides = [1] * ndim
+        out = [strided_slice(o, beg, end, strides) for o in out]

Review comment:
       I modified cuda topk so that topk in dyn can pass. However, topk in test any in which data has dynamic shape can't pass without Thrust. I disable that test for now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] kevinthesun commented on pull request #7018: [Topi] Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi

Posted by GitBox <gi...@apache.org>.

kevinthesun commented on pull request #7018:
URL: https://github.com/apache/tvm/pull/7018#issuecomment-738342059


   AFAIK cuda sort has several issues:
   1. Performance is bad for large workloads.
   2. Can't handle dynamic data shape well.
   3. Can generate flaky result.
   
   There is no clear path to a solution to these problems. For now the best way is to let user turn on Thrust, when they want to compile sort related op on nvidia gpu.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] zhiics commented on pull request #7018: [Topi] Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi

Posted by GitBox <gi...@apache.org>.

zhiics commented on pull request #7018:
URL: https://github.com/apache/tvm/pull/7018#issuecomment-738355992


   @mbrookhart yeah, argwhere is flaky on large inputs if sort is used


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] kevinthesun edited a comment on pull request #7018: [Topi] Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi

Posted by GitBox <gi...@apache.org>.

kevinthesun edited a comment on pull request #7018:
URL: https://github.com/apache/tvm/pull/7018#issuecomment-738287828


   I think we can raise an exception when compiling dynamic topk but Thrust is not enabled. Building with Thrust usually needs extra effort since it requires cmake >=3.13. User can enable it when necessary. For tvm cuda sorting, I'm not sure whether it covers some cases  which Thrust doesn't. Maybe we can keep it a while.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] kevinthesun commented on pull request #7018: [Topi] Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi

Posted by GitBox <gi...@apache.org>.

kevinthesun commented on pull request #7018:
URL: https://github.com/apache/tvm/pull/7018#issuecomment-738240723


   @mbrookhart Generally we need thrust for this dynamic sorting ops. nvptx will have issue to compile them.
   @icemelon9 We need to enable thrust for ci gpu. https://github.com/apache/tvm/pull/7024


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #7018: [Topi] Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi

Posted by GitBox <gi...@apache.org>.

mbrookhart commented on pull request #7018:
URL: https://github.com/apache/tvm/pull/7018#issuecomment-738347327


   Yeah, the perf of the kernel isn't great, and I see some thread definition issues that will cause issues with dynamic shapes. Do we have a flaky test we can include? I don't think it's important for this PR, but it might be interesting to tackle later.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart commented on a change in pull request #7018: [Topi] Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi

Posted by GitBox <gi...@apache.org>.

mbrookhart commented on a change in pull request #7018:
URL: https://github.com/apache/tvm/pull/7018#discussion_r534568965



##########
File path: include/tvm/topi/transform.h
##########
@@ -598,17 +598,69 @@ inline te::Tensor dynamic_strided_slice(const te::Tensor& x, const te::Tensor& b
  *
  * \return A Tensor whose op member is the split operation
  */
-inline Tensor strided_slice(const Tensor& x, const Array<Integer>& begin, const Array<Integer>& end,
-                            const Array<Integer>& strides, std::string slice_mode = "end",
-                            std::string name = "T_strided_slice", std::string tag = kInjective) {
+inline Tensor strided_slice(const Tensor& x, const Array<PrimExpr>& begin,
+                            const Array<PrimExpr>& end, const Array<PrimExpr>& strides,
+                            std::string slice_mode = "end", std::string name = "T_strided_slice",
+                            std::string tag = kInjective) {
   size_t src_tensor_dim = static_cast<size_t>(x->shape.size());
+  // Quick path for dynamic shape strided slice.
+  // This is for ease of use to dynamice strided slice in topi.
+  bool is_dyn = false;
+  for (size_t i = 0; i < src_tensor_dim; ++i) {
+    if (!IsConstInt(x->shape[i])) {
+      is_dyn = true;
+      break;
+    }
+  }
+  if (!is_dyn) {
+    for (size_t i = 0; i < begin.size(); ++i) {
+      if (begin[i].defined() && !IsConstInt(begin[i])) {
+        is_dyn = true;
+        break;
+      }
+    }
+  }
+  if (!is_dyn) {
+    for (size_t i = 0; i < end.size(); ++i) {
+      if (end[i].defined() && !IsConstInt(end[i])) {
+        is_dyn = true;
+        break;
+      }
+    }
+  }
+  if (!is_dyn) {
+    for (size_t i = 0; i < strides.size(); ++i) {
+      if (strides[i].defined() && !IsConstInt(strides[i])) {
+        is_dyn = true;
+        break;
+      }
+    }
+  }

Review comment:
       Any chance we could do this in an outer loop over src_tensor_dim, begin, end, strides?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #7018: [Topi] Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi

Posted by GitBox <gi...@apache.org>.

mbrookhart commented on pull request #7018:
URL: https://github.com/apache/tvm/pull/7018#issuecomment-738243991


   I don't love making thrust a necessary component unless we automatically enable it when we turn on cuda? If we don't support the tir-based sort, should we remove it from the codebase?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #7018: [Topi] Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi

Posted by GitBox <gi...@apache.org>.

mbrookhart commented on pull request #7018:
URL: https://github.com/apache/tvm/pull/7018#issuecomment-738328610


   I'm not really sure what's wrong with the tir sort, do we have a regression test/issue we could track?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] zhiics merged pull request #7018: [Topi] Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi

Posted by GitBox <gi...@apache.org>.

zhiics merged pull request #7018:
URL: https://github.com/apache/tvm/pull/7018


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #7018: [Topi] Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi

Posted by GitBox <gi...@apache.org>.

mbrookhart commented on pull request #7018:
URL: https://github.com/apache/tvm/pull/7018#issuecomment-738375832


   :/ OddEvenTransportSort should be stable, but something looks very wrong about the threading in this kernel. I'll see if I can edit to to solve these problems at some point in the near-ish future. If somehow this sort isn't stable, that would easily explain flakiness in argwhere/argsort.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] zhiics commented on pull request #7018: [Topi] Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi

Posted by GitBox <gi...@apache.org>.

zhiics commented on pull request #7018:
URL: https://github.com/apache/tvm/pull/7018#issuecomment-738327979


   I think without thrust, we then have to fix sort. We can probably disable the test for now and come back to work on sorting and then enable the test. This would at least unblock downstream users to run models through thrust. @mbrookhart @icemelon9 @kevinthesun how do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #7018: [Topi] Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi

Posted by GitBox <gi...@apache.org>.

mbrookhart commented on pull request #7018:
URL: https://github.com/apache/tvm/pull/7018#issuecomment-737647798


   Out of curiosity, why no nvptx?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] kevinthesun commented on a change in pull request #7018: [Topi] Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi

Posted by GitBox <gi...@apache.org>.

kevinthesun commented on a change in pull request #7018:
URL: https://github.com/apache/tvm/pull/7018#discussion_r534584859



##########
File path: include/tvm/topi/transform.h
##########
@@ -598,17 +598,69 @@ inline te::Tensor dynamic_strided_slice(const te::Tensor& x, const te::Tensor& b
  *
  * \return A Tensor whose op member is the split operation
  */
-inline Tensor strided_slice(const Tensor& x, const Array<Integer>& begin, const Array<Integer>& end,
-                            const Array<Integer>& strides, std::string slice_mode = "end",
-                            std::string name = "T_strided_slice", std::string tag = kInjective) {
+inline Tensor strided_slice(const Tensor& x, const Array<PrimExpr>& begin,
+                            const Array<PrimExpr>& end, const Array<PrimExpr>& strides,
+                            std::string slice_mode = "end", std::string name = "T_strided_slice",
+                            std::string tag = kInjective) {
   size_t src_tensor_dim = static_cast<size_t>(x->shape.size());
+  // Quick path for dynamic shape strided slice.
+  // This is for ease of use to dynamice strided slice in topi.
+  bool is_dyn = false;
+  for (size_t i = 0; i < src_tensor_dim; ++i) {
+    if (!IsConstInt(x->shape[i])) {
+      is_dyn = true;
+      break;
+    }
+  }
+  if (!is_dyn) {
+    for (size_t i = 0; i < begin.size(); ++i) {
+      if (begin[i].defined() && !IsConstInt(begin[i])) {
+        is_dyn = true;
+        break;
+      }
+    }
+  }
+  if (!is_dyn) {
+    for (size_t i = 0; i < end.size(); ++i) {
+      if (end[i].defined() && !IsConstInt(end[i])) {
+        is_dyn = true;
+        break;
+      }
+    }
+  }
+  if (!is_dyn) {
+    for (size_t i = 0; i < strides.size(); ++i) {
+      if (strides[i].defined() && !IsConstInt(strides[i])) {
+        is_dyn = true;
+        break;
+      }
+    }
+  }
+
+  Array<PrimExpr> out_shape;
+  if (is_dyn) {
+    for (size_t i = 0; i < src_tensor_dim; ++i) {
+      out_shape.push_back(indexdiv(end[i] - begin[i], strides[i]));
+    }
+    return te::compute(
+        out_shape,
+        [&](const Array<tvm::tir::Var>& indices) {
+          Array<PrimExpr> real_indices;
+          for (size_t i = 0; i < src_tensor_dim; ++i) {
+            real_indices.push_back(indices[i] * strides[i] + begin[i]);
+          }
+          return x(real_indices);
+        },
+        name, tag);
+  }
+

Review comment:
       That ```dynamic_strided_slice``` has ```begin```, ```end``` and ```strides``` as tensors. As a result we can't use them to do PrimExpr computation.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart commented on a change in pull request #7018: [Topi] Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi

Posted by GitBox <gi...@apache.org>.

mbrookhart commented on a change in pull request #7018:
URL: https://github.com/apache/tvm/pull/7018#discussion_r534571294



##########
File path: include/tvm/topi/transform.h
##########
@@ -598,17 +598,69 @@ inline te::Tensor dynamic_strided_slice(const te::Tensor& x, const te::Tensor& b
  *
  * \return A Tensor whose op member is the split operation
  */
-inline Tensor strided_slice(const Tensor& x, const Array<Integer>& begin, const Array<Integer>& end,
-                            const Array<Integer>& strides, std::string slice_mode = "end",
-                            std::string name = "T_strided_slice", std::string tag = kInjective) {
+inline Tensor strided_slice(const Tensor& x, const Array<PrimExpr>& begin,
+                            const Array<PrimExpr>& end, const Array<PrimExpr>& strides,
+                            std::string slice_mode = "end", std::string name = "T_strided_slice",
+                            std::string tag = kInjective) {
   size_t src_tensor_dim = static_cast<size_t>(x->shape.size());
+  // Quick path for dynamic shape strided slice.
+  // This is for ease of use to dynamice strided slice in topi.
+  bool is_dyn = false;
+  for (size_t i = 0; i < src_tensor_dim; ++i) {
+    if (!IsConstInt(x->shape[i])) {
+      is_dyn = true;
+      break;
+    }
+  }
+  if (!is_dyn) {
+    for (size_t i = 0; i < begin.size(); ++i) {
+      if (begin[i].defined() && !IsConstInt(begin[i])) {
+        is_dyn = true;
+        break;
+      }
+    }
+  }
+  if (!is_dyn) {
+    for (size_t i = 0; i < end.size(); ++i) {
+      if (end[i].defined() && !IsConstInt(end[i])) {
+        is_dyn = true;
+        break;
+      }
+    }
+  }
+  if (!is_dyn) {
+    for (size_t i = 0; i < strides.size(); ++i) {
+      if (strides[i].defined() && !IsConstInt(strides[i])) {
+        is_dyn = true;
+        break;
+      }
+    }
+  }
+
+  Array<PrimExpr> out_shape;
+  if (is_dyn) {
+    for (size_t i = 0; i < src_tensor_dim; ++i) {
+      out_shape.push_back(indexdiv(end[i] - begin[i], strides[i]));
+    }
+    return te::compute(
+        out_shape,
+        [&](const Array<tvm::tir::Var>& indices) {
+          Array<PrimExpr> real_indices;
+          for (size_t i = 0; i < src_tensor_dim; ++i) {
+            real_indices.push_back(indices[i] * strides[i] + begin[i]);
+          }
+          return x(real_indices);
+        },
+        name, tag);
+  }
+

Review comment:
       Can you move this edit into the dynamic_strided_slice function above and call that function here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] zhiics commented on pull request #7018: [Topi] Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi

Posted by GitBox <gi...@apache.org>.

zhiics commented on pull request #7018:
URL: https://github.com/apache/tvm/pull/7018#issuecomment-738563250


   Thanks @kevinthesun @mbrookhart @icemelon9 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart commented on a change in pull request #7018: [Topi] Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi

Posted by GitBox <gi...@apache.org>.

mbrookhart commented on a change in pull request #7018:
URL: https://github.com/apache/tvm/pull/7018#discussion_r535596794



##########
File path: python/tvm/topi/cuda/sort.py
##########
@@ -561,10 +561,11 @@ def topk_thrust(data, k=1, axis=-1, ret_type="both", is_ascend=False, dtype="int
         tag="topk_gpu",
     )
 
-    if k > 0:
+    if not isinstance(k, int) or k > 0:
         beg = [0] * ndim
-        end = data.shape[:-1] + [k]
-        out = [strided_slice(o, beg, end) for o in out]
+        end = data.shape[:-1] + [k if isinstance(k, int) else tvm.te.size_var("dim")]
+        strides = [1] * ndim
+        out = [strided_slice(o, beg, end, strides) for o in out]

Review comment:
       @kevinthesun, why don't we just repeat this change in the tir topk above? that would fix the unit test, I think.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] kevinthesun commented on pull request #7018: [Topi] Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi

Posted by GitBox <gi...@apache.org>.

kevinthesun commented on pull request #7018:
URL: https://github.com/apache/tvm/pull/7018#issuecomment-738287828


   I think we can raise an exception when compiling dynamic topk but Thrust is not enable. Building with Thrust usually needs extra effort since it requires cmake >=3.13. User can enable it when necessary. For tvm cuda sorting, I'm not sure whether it covers some cases  which Thrust doesn't. Maybe we can keep it a while.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org