You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/09/28 19:02:24 UTC

[GitHub] [incubator-tvm] tkonolige opened a new pull request #6580: Faster sparse_dense on GPUs.

tkonolige opened a new pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580


   I've written a faster sparse_dense for GPUs using tir. This sparse_dense requires a padded matrix, so I've added a new op sparse_dense_padded. AlterOpLayout should transform sparse_dense to sparse_dense_padded when using a gpu.
   
   This new sparse_dense improves prunebert performance from 155.41ms mean to 7.75ms mean. In general, this implementation is faster than cublas dense on matrices with density < 0.05 and is often faster than cusparse for machine learning workloads.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

tkonolige commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r501193312



##########
File path: python/tvm/tir/ir_builder.py
##########
@@ -62,9 +62,10 @@ class BufferVar(ObjectGeneric):
     IRBuilder.allocate
     """
 
-    def __init__(self, builder, buffer_var, content_type):
+    def __init__(self, builder, buffer_var, shape, content_type):

Review comment:
       You can use shape=None and the code will work fine assuming you always use a single index.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

tkonolige commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r498365231



##########
File path: tests/python/topi/python/test_topi_sparse.py
##########
@@ -342,38 +343,29 @@ def verify_sparse_dense_bsr(M, N, K, BS_R, BS_C, density, use_relu):
     W_indptr = te.placeholder(shape=W_sp_np.indptr.shape, dtype=str(W_sp_np.indptr.dtype))
     X = te.placeholder(shape=X_np.shape, dtype=str(X_np.dtype))
 
-    def check_device(device):
-        ctx = tvm.context(device, 0)
-        if not tvm.testing.device_enabled(device):
-            print("Skip because %s is not enabled" % device)
-            return
-        print("Running on target: %s" % device)
-        fcompute, fschedule = tvm.topi.testing.dispatch(device, _sparse_dense_implement)
-        with tvm.target.Target(device):
-            Y = fcompute(X, W_data, W_indices, W_indptr)
-            if use_relu:
-                Y = topi.nn.relu(Y)
-            s = fschedule([Y])
-            func = tvm.build(s, [X, W_data, W_indices, W_indptr, Y])
-            Y_tvm = tvm.nd.array(np.zeros(Y_np.shape, dtype=Y_np.dtype), ctx=ctx)
-            func(
-                tvm.nd.array(X_np, ctx=ctx),
-                tvm.nd.array(W_sp_np.data, ctx=ctx),
-                tvm.nd.array(W_sp_np.indices, ctx=ctx),
-                tvm.nd.array(W_sp_np.indptr, ctx=ctx),
-                Y_tvm,
-            )
-            tvm.testing.assert_allclose(Y_tvm.asnumpy(), Y_np, atol=1e-4, rtol=1e-4)
-
-    for device in ["llvm", "cuda"]:
-        check_device(device)
+    fcompute, fschedule = tvm.topi.testing.dispatch(target, _sparse_dense_implement)
+    with tvm.target.Target(target):
+        Y = fcompute(X, W_data, W_indices, W_indptr)
+        if use_relu:
+            Y = topi.nn.relu(Y)
+        s = fschedule([Y])
+        func = tvm.build(s, [X, W_data, W_indices, W_indptr, Y])
+        Y_tvm = tvm.nd.array(np.zeros(Y_np.shape, dtype=Y_np.dtype), ctx=ctx)
+        func(
+            tvm.nd.array(X_np, ctx=ctx),
+            tvm.nd.array(W_sp_np.data, ctx=ctx),
+            tvm.nd.array(W_sp_np.indices, ctx=ctx),
+            tvm.nd.array(W_sp_np.indptr, ctx=ctx),
+            Y_tvm,
+        )
+        tvm.testing.assert_allclose(Y_tvm.asnumpy(), Y_np, atol=1e-4, rtol=1e-4)
 
 
-@tvm.testing.uses_gpu
-def test_sparse_dense_bsr():
+@tvm.testing.parametrize_targets("llvm", "cuda")
+def test_sparse_dense_bsr_relu(ctx, target):

Review comment:
       This test is already enabled.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] ANSHUMAN87 commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

ANSHUMAN87 commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r499213027



##########
File path: python/tvm/tir/ir_builder.py
##########
@@ -62,9 +62,10 @@ class BufferVar(ObjectGeneric):
     IRBuilder.allocate
     """
 
-    def __init__(self, builder, buffer_var, content_type):
+    def __init__(self, builder, buffer_var, shape, content_type):

Review comment:
       I don't think this change is necessary, but i would wait for your response for my other comment. Maybe after that we can discuss here :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] ANSHUMAN87 commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

ANSHUMAN87 commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r501862525



##########
File path: python/tvm/tir/ir_builder.py
##########
@@ -74,8 +75,23 @@ def asobject(self):
     def dtype(self):
         return self._content_type
 
+    def _linear_index(self, index):
+        if not isinstance(index, tuple):

Review comment:
       We can also check for _shape is None, then return here.

##########
File path: python/tvm/tir/ir_builder.py
##########
@@ -62,9 +62,10 @@ class BufferVar(ObjectGeneric):
     IRBuilder.allocate
     """
 
-    def __init__(self, builder, buffer_var, content_type):
+    def __init__(self, builder, buffer_var, shape, content_type):

Review comment:
       Request to add small comment section to describe new way to use it.
   As the new cases are 2, either with already linear index or actual index. It will help understand the new change better.

##########
File path: python/tvm/topi/nn/sparse.py
##########
@@ -207,3 +207,28 @@ def _csr_transpose_ir(data, indices, indptr, out_data, out_indices, out_indptr):
         last[0] = temp2[0]
 
     return irb.get()
+
+
+@tvm.target.generic_func
+def sparse_dense_alter_layout(_attrs, _inputs, _tinfos, _out_type):

Review comment:
       Any reason to use _ naming for inputs?

##########
File path: src/relay/op/nn/sparse.cc
##########
@@ -76,7 +76,36 @@ TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense")
     });
 
 RELAY_REGISTER_OP("nn.sparse_dense")
-    .describe(R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with X sparse.
+    .describe(R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with W sparse.
+
+- **data**: `(x1, x2, ..., xn, input_dim)`
+- **weight**: `(units, input_dim)`
+- **out**: `(x1, x2, ..., xn, units)`.
+
+)code" TVM_ADD_FILELINE)
+    .set_attrs_type<SparseDenseAttrs>()
+    .set_num_inputs(4)
+    .add_argument("data", "nD Tensor", "Input data.")
+    .add_argument("weight_data", "1D Tensor", "Weight data matrix.")
+    .add_argument("weight_indices", "1D Tensor", "Weight indices matrix.")
+    .add_argument("weight_indptr", "1D Tensor", "Weight indptr matrix.")
+    .set_support_level(1)
+    .add_type_rel("SparseDense", SparseDenseRel);
+
+Expr MakeSparseDensePadded(Expr data, Expr weight_data, Expr weight_indices, Expr weight_indptr) {
+  auto attrs = make_object<SparseDenseAttrs>();
+  static const Op& op = Op::Get("nn.sparse_dense_padded");
+  return Call(op, {data, weight_data, weight_indices, weight_indptr}, Attrs(attrs), {});
+}
+
+TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense_padded")
+    .set_body([](const TVMArgs& args, TVMRetValue* rv) {
+      runtime::detail::unpack_call<Expr, 4>(MakeSparseDensePadded, args, rv);
+    });
+
+RELAY_REGISTER_OP("nn.sparse_dense_padded")

Review comment:
       In which case this Op will be used independently? 

##########
File path: python/tvm/topi/nn/sparse.py
##########
@@ -207,3 +207,28 @@ def _csr_transpose_ir(data, indices, indptr, out_data, out_indices, out_indptr):
         last[0] = temp2[0]
 
     return irb.get()
+
+
+@tvm.target.generic_func
+def sparse_dense_alter_layout(_attrs, _inputs, _tinfos, _out_type):

Review comment:
       Ok, got it. Thanks for clarifying!
   We can also use macro like below 
   # pylint: disable=
   def func():

##########
File path: python/tvm/topi/nn/sparse.py
##########
@@ -207,3 +207,28 @@ def _csr_transpose_ir(data, indices, indptr, out_data, out_indices, out_indptr):
         last[0] = temp2[0]
 
     return irb.get()
+
+
+@tvm.target.generic_func
+def sparse_dense_alter_layout(_attrs, _inputs, _tinfos, _out_type):

Review comment:
       Ok, got it. Thanks for clarifying!
   We can also use macro like below 
     #pylint: disable=
   def func():

##########
File path: src/relay/op/nn/sparse.cc
##########
@@ -76,7 +76,36 @@ TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense")
     });
 
 RELAY_REGISTER_OP("nn.sparse_dense")
-    .describe(R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with X sparse.
+    .describe(R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with W sparse.
+
+- **data**: `(x1, x2, ..., xn, input_dim)`
+- **weight**: `(units, input_dim)`
+- **out**: `(x1, x2, ..., xn, units)`.
+
+)code" TVM_ADD_FILELINE)
+    .set_attrs_type<SparseDenseAttrs>()
+    .set_num_inputs(4)
+    .add_argument("data", "nD Tensor", "Input data.")
+    .add_argument("weight_data", "1D Tensor", "Weight data matrix.")
+    .add_argument("weight_indices", "1D Tensor", "Weight indices matrix.")
+    .add_argument("weight_indptr", "1D Tensor", "Weight indptr matrix.")
+    .set_support_level(1)
+    .add_type_rel("SparseDense", SparseDenseRel);
+
+Expr MakeSparseDensePadded(Expr data, Expr weight_data, Expr weight_indices, Expr weight_indptr) {
+  auto attrs = make_object<SparseDenseAttrs>();
+  static const Op& op = Op::Get("nn.sparse_dense_padded");
+  return Call(op, {data, weight_data, weight_indices, weight_indptr}, Attrs(attrs), {});
+}
+
+TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense_padded")
+    .set_body([](const TVMArgs& args, TVMRetValue* rv) {
+      runtime::detail::unpack_call<Expr, 4>(MakeSparseDensePadded, args, rv);
+    });
+
+RELAY_REGISTER_OP("nn.sparse_dense_padded")

Review comment:
       I am okay with the disclaimer :)
   If it is really not necessary, may be we don't expose this as global Op.
   I am not too sure. Better we get some more opinion on this point.
   

##########
File path: src/relay/op/nn/sparse.cc
##########
@@ -76,7 +76,36 @@ TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense")
     });
 
 RELAY_REGISTER_OP("nn.sparse_dense")
-    .describe(R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with X sparse.
+    .describe(R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with W sparse.
+
+- **data**: `(x1, x2, ..., xn, input_dim)`
+- **weight**: `(units, input_dim)`
+- **out**: `(x1, x2, ..., xn, units)`.
+
+)code" TVM_ADD_FILELINE)
+    .set_attrs_type<SparseDenseAttrs>()
+    .set_num_inputs(4)
+    .add_argument("data", "nD Tensor", "Input data.")
+    .add_argument("weight_data", "1D Tensor", "Weight data matrix.")
+    .add_argument("weight_indices", "1D Tensor", "Weight indices matrix.")
+    .add_argument("weight_indptr", "1D Tensor", "Weight indptr matrix.")
+    .set_support_level(1)
+    .add_type_rel("SparseDense", SparseDenseRel);
+
+Expr MakeSparseDensePadded(Expr data, Expr weight_data, Expr weight_indices, Expr weight_indptr) {
+  auto attrs = make_object<SparseDenseAttrs>();
+  static const Op& op = Op::Get("nn.sparse_dense_padded");
+  return Call(op, {data, weight_data, weight_indices, weight_indptr}, Attrs(attrs), {});
+}
+
+TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense_padded")
+    .set_body([](const TVMArgs& args, TVMRetValue* rv) {
+      runtime::detail::unpack_call<Expr, 4>(MakeSparseDensePadded, args, rv);
+    });
+
+RELAY_REGISTER_OP("nn.sparse_dense_padded")

Review comment:
       I think "nn.internal.xxx" is a good option. 
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] ANSHUMAN87 commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

ANSHUMAN87 commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r501061630



##########
File path: python/tvm/tir/ir_builder.py
##########
@@ -62,9 +62,10 @@ class BufferVar(ObjectGeneric):
     IRBuilder.allocate
     """
 
-    def __init__(self, builder, buffer_var, content_type):
+    def __init__(self, builder, buffer_var, shape, content_type):

Review comment:
       I have one query here. What happens if we don't have the shape info. In that case, how do you enforce the check in _linear_index ?
   
   For example: We create a buffer pointer out of certain existing buffer, in that case you may not have necessarily the shape info, in that case, how it works ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] ANSHUMAN87 commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

ANSHUMAN87 commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r501862525



##########
File path: python/tvm/tir/ir_builder.py
##########
@@ -74,8 +75,23 @@ def asobject(self):
     def dtype(self):
         return self._content_type
 
+    def _linear_index(self, index):
+        if not isinstance(index, tuple):

Review comment:
       We can also check for _shape is None, then return here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#issuecomment-706581124


   @tkonolige @tqchen This commits fails in the master branch. see the CI: https://github.com/apache/incubator-tvm/commits/master
   
   It introduces a flaky test that blocks two of my PRs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

tkonolige commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r501889549



##########
File path: python/tvm/topi/nn/sparse.py
##########
@@ -207,3 +207,28 @@ def _csr_transpose_ir(data, indices, indptr, out_data, out_indices, out_indptr):
         last[0] = temp2[0]
 
     return irb.get()
+
+
+@tvm.target.generic_func
+def sparse_dense_alter_layout(_attrs, _inputs, _tinfos, _out_type):

Review comment:
       Pylint complains these inputs are unused. Prefixing them with _ silences the warning and indicates to other people that the argument is unused.

##########
File path: src/relay/op/nn/sparse.cc
##########
@@ -76,7 +76,36 @@ TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense")
     });
 
 RELAY_REGISTER_OP("nn.sparse_dense")
-    .describe(R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with X sparse.
+    .describe(R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with W sparse.
+
+- **data**: `(x1, x2, ..., xn, input_dim)`
+- **weight**: `(units, input_dim)`
+- **out**: `(x1, x2, ..., xn, units)`.
+
+)code" TVM_ADD_FILELINE)
+    .set_attrs_type<SparseDenseAttrs>()
+    .set_num_inputs(4)
+    .add_argument("data", "nD Tensor", "Input data.")
+    .add_argument("weight_data", "1D Tensor", "Weight data matrix.")
+    .add_argument("weight_indices", "1D Tensor", "Weight indices matrix.")
+    .add_argument("weight_indptr", "1D Tensor", "Weight indptr matrix.")
+    .set_support_level(1)
+    .add_type_rel("SparseDense", SparseDenseRel);
+
+Expr MakeSparseDensePadded(Expr data, Expr weight_data, Expr weight_indices, Expr weight_indptr) {
+  auto attrs = make_object<SparseDenseAttrs>();
+  static const Op& op = Op::Get("nn.sparse_dense_padded");
+  return Call(op, {data, weight_data, weight_indices, weight_indptr}, Attrs(attrs), {});
+}
+
+TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense_padded")
+    .set_body([](const TVMArgs& args, TVMRetValue* rv) {
+      runtime::detail::unpack_call<Expr, 4>(MakeSparseDensePadded, args, rv);
+    });
+
+RELAY_REGISTER_OP("nn.sparse_dense_padded")

Review comment:
       This op shouldn't be used directly by a user unless they are sure that the input matrix is in the correct format. I've documented this.

##########
File path: src/relay/op/nn/sparse.cc
##########
@@ -76,7 +76,36 @@ TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense")
     });
 
 RELAY_REGISTER_OP("nn.sparse_dense")
-    .describe(R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with X sparse.
+    .describe(R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with W sparse.
+
+- **data**: `(x1, x2, ..., xn, input_dim)`
+- **weight**: `(units, input_dim)`
+- **out**: `(x1, x2, ..., xn, units)`.
+
+)code" TVM_ADD_FILELINE)
+    .set_attrs_type<SparseDenseAttrs>()
+    .set_num_inputs(4)
+    .add_argument("data", "nD Tensor", "Input data.")
+    .add_argument("weight_data", "1D Tensor", "Weight data matrix.")
+    .add_argument("weight_indices", "1D Tensor", "Weight indices matrix.")
+    .add_argument("weight_indptr", "1D Tensor", "Weight indptr matrix.")
+    .set_support_level(1)
+    .add_type_rel("SparseDense", SparseDenseRel);
+
+Expr MakeSparseDensePadded(Expr data, Expr weight_data, Expr weight_indices, Expr weight_indptr) {
+  auto attrs = make_object<SparseDenseAttrs>();
+  static const Op& op = Op::Get("nn.sparse_dense_padded");
+  return Call(op, {data, weight_data, weight_indices, weight_indptr}, Attrs(attrs), {});
+}
+
+TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense_padded")
+    .set_body([](const TVMArgs& args, TVMRetValue* rv) {
+      runtime::detail::unpack_call<Expr, 4>(MakeSparseDensePadded, args, rv);
+    });
+
+RELAY_REGISTER_OP("nn.sparse_dense_padded")

Review comment:
       How can I not expose it? Do I just change the name to "internal.nn.sparse_dense_padded"?

##########
File path: src/relay/op/nn/sparse.cc
##########
@@ -76,7 +76,36 @@ TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense")
     });
 
 RELAY_REGISTER_OP("nn.sparse_dense")
-    .describe(R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with X sparse.
+    .describe(R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with W sparse.
+
+- **data**: `(x1, x2, ..., xn, input_dim)`
+- **weight**: `(units, input_dim)`
+- **out**: `(x1, x2, ..., xn, units)`.
+
+)code" TVM_ADD_FILELINE)
+    .set_attrs_type<SparseDenseAttrs>()
+    .set_num_inputs(4)
+    .add_argument("data", "nD Tensor", "Input data.")
+    .add_argument("weight_data", "1D Tensor", "Weight data matrix.")
+    .add_argument("weight_indices", "1D Tensor", "Weight indices matrix.")
+    .add_argument("weight_indptr", "1D Tensor", "Weight indptr matrix.")
+    .set_support_level(1)
+    .add_type_rel("SparseDense", SparseDenseRel);
+
+Expr MakeSparseDensePadded(Expr data, Expr weight_data, Expr weight_indices, Expr weight_indptr) {
+  auto attrs = make_object<SparseDenseAttrs>();
+  static const Op& op = Op::Get("nn.sparse_dense_padded");
+  return Call(op, {data, weight_data, weight_indices, weight_indptr}, Attrs(attrs), {});
+}
+
+TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense_padded")
+    .set_body([](const TVMArgs& args, TVMRetValue* rv) {
+      runtime::detail::unpack_call<Expr, 4>(MakeSparseDensePadded, args, rv);
+    });
+
+RELAY_REGISTER_OP("nn.sparse_dense_padded")

Review comment:
       Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] ANSHUMAN87 commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

ANSHUMAN87 commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r499212868



##########
File path: python/tvm/tir/ir_builder.py
##########
@@ -74,8 +75,23 @@ def asobject(self):
     def dtype(self):
         return self._content_type
 
+    def _linear_index(self, index):
+        if not isinstance(index, tuple):
+            return index
+        assert len(index) == len(self._shape), "Index size (%s) does not match shape size (%s)" % (
+            len(index),
+            len(self._shape),
+        )
+        dim_size = 1
+        lidx = 0
+        for dim, idx in zip(reversed(self._shape), reversed(index)):
+            lidx += idx * dim_size
+            dim_size *= dim
+        return lidx
+
     def __getitem__(self, index):
         t = DataType(self._content_type)
+        index = self._linear_index(index)

Review comment:
       @tkonolige : I am just trying to understand why this change is required for your requirement? 
   Would you please help me understand. Thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] ANSHUMAN87 commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

ANSHUMAN87 commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r501872945



##########
File path: python/tvm/topi/nn/sparse.py
##########
@@ -207,3 +207,28 @@ def _csr_transpose_ir(data, indices, indptr, out_data, out_indices, out_indptr):
         last[0] = temp2[0]
 
     return irb.get()
+
+
+@tvm.target.generic_func
+def sparse_dense_alter_layout(_attrs, _inputs, _tinfos, _out_type):

Review comment:
       Any reason to use _ naming for inputs?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

tkonolige commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r502004041



##########
File path: src/relay/op/nn/sparse.cc
##########
@@ -76,7 +76,36 @@ TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense")
     });
 
 RELAY_REGISTER_OP("nn.sparse_dense")
-    .describe(R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with X sparse.
+    .describe(R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with W sparse.
+
+- **data**: `(x1, x2, ..., xn, input_dim)`
+- **weight**: `(units, input_dim)`
+- **out**: `(x1, x2, ..., xn, units)`.
+
+)code" TVM_ADD_FILELINE)
+    .set_attrs_type<SparseDenseAttrs>()
+    .set_num_inputs(4)
+    .add_argument("data", "nD Tensor", "Input data.")
+    .add_argument("weight_data", "1D Tensor", "Weight data matrix.")
+    .add_argument("weight_indices", "1D Tensor", "Weight indices matrix.")
+    .add_argument("weight_indptr", "1D Tensor", "Weight indptr matrix.")
+    .set_support_level(1)
+    .add_type_rel("SparseDense", SparseDenseRel);
+
+Expr MakeSparseDensePadded(Expr data, Expr weight_data, Expr weight_indices, Expr weight_indptr) {
+  auto attrs = make_object<SparseDenseAttrs>();
+  static const Op& op = Op::Get("nn.sparse_dense_padded");
+  return Call(op, {data, weight_data, weight_indices, weight_indptr}, Attrs(attrs), {});
+}
+
+TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense_padded")
+    .set_body([](const TVMArgs& args, TVMRetValue* rv) {
+      runtime::detail::unpack_call<Expr, 4>(MakeSparseDensePadded, args, rv);
+    });
+
+RELAY_REGISTER_OP("nn.sparse_dense_padded")

Review comment:
       Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy commented on pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

merrymercy commented on pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#issuecomment-706581124


   @tkonolige @tqchen This commits fails to be built in the master branch. see the CI: https://github.com/apache/incubator-tvm/commits/master
   
   It introduces a flaky test that blocks two of my PRs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] ANSHUMAN87 commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

ANSHUMAN87 commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r501061630



##########
File path: python/tvm/tir/ir_builder.py
##########
@@ -62,9 +62,10 @@ class BufferVar(ObjectGeneric):
     IRBuilder.allocate
     """
 
-    def __init__(self, builder, buffer_var, content_type):
+    def __init__(self, builder, buffer_var, shape, content_type):

Review comment:
       I have one query here. What happens if we don't have the shape info. In that case, how do you enforce the check in _linear_index ?
   
   For example: We create a buffer pointer out of certain existing buffer, in that case you may not have necessarily the shape info, in that case, how it works ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] ANSHUMAN87 commented on pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

ANSHUMAN87 commented on pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#issuecomment-700929567


   > I've written a faster sparse_dense for GPUs using tir. This sparse_dense requires a padded matrix, so I've added a new op sparse_dense_padded. AlterOpLayout should transform sparse_dense to sparse_dense_padded when using a gpu.
   > 
   > This new sparse_dense improves prunebert performance from 155.41ms mean to 7.75ms mean. In general, this implementation is faster than cublas dense on matrices with density < 0.05 and is often faster than cusparse for machine learning workloads.
   
   @tkonolige : Thanks for the PR! The data looks quite impressive :+1: 
   I was wondering whether we can add some sort of benchmark testcase here , tuned to your shared data?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] ANSHUMAN87 commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

ANSHUMAN87 commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r501054040



##########
File path: python/tvm/tir/ir_builder.py
##########
@@ -74,8 +75,23 @@ def asobject(self):
     def dtype(self):
         return self._content_type
 
+    def _linear_index(self, index):
+        if not isinstance(index, tuple):
+            return index
+        assert len(index) == len(self._shape), "Index size (%s) does not match shape size (%s)" % (
+            len(index),
+            len(self._shape),
+        )
+        dim_size = 1
+        lidx = 0
+        for dim, idx in zip(reversed(self._shape), reversed(index)):
+            lidx += idx * dim_size
+            dim_size *= dim
+        return lidx
+
     def __getitem__(self, index):
         t = DataType(self._content_type)
+        index = self._linear_index(index)

Review comment:
       Thanks @tkonolige for clarification!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] ANSHUMAN87 commented on pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

ANSHUMAN87 commented on pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#issuecomment-701169720


   @tkonolige : I understand your concern clearly. However it was just a thought. Even if run to run or machine to machine difference, the relative reference would be same. But may be we don't have to do as part of this PR :)
   I will go through deep into your PR, will share my comment if any. Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] ANSHUMAN87 commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

ANSHUMAN87 commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r501947855



##########
File path: python/tvm/topi/nn/sparse.py
##########
@@ -207,3 +207,28 @@ def _csr_transpose_ir(data, indices, indptr, out_data, out_indices, out_indptr):
         last[0] = temp2[0]
 
     return irb.get()
+
+
+@tvm.target.generic_func
+def sparse_dense_alter_layout(_attrs, _inputs, _tinfos, _out_type):

Review comment:
       Ok, got it. Thanks for clarifying!
   We can also use macro like below 
   # pylint: disable=
   def func():




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] Laurawly commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

Laurawly commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r498607086



##########
File path: python/tvm/topi/cuda/sparse.py
##########
@@ -97,3 +94,287 @@ def _callback(op):
 
     traverse_inline(s, outs[0].op, _callback)
     return s
+
+
+def schedule_cuda_transpose(s, out):
+    """Schedule for transpose on the gpu.
+
+    Roughly follows this:
+    https://developer.nvidia.com/blog/efficient-matrix-transpose-cuda-cc/, but
+    without the padding for shared memory. For better performance, we could
+    rewrite it in tir to add the padding.
+    """
+
+    def _callback(op):
+        # pylint: disable=invalid-name
+        m, n = s[op].op.axis
+        warp_size = int(tvm.target.Target.current(allow_none=False).thread_warp_size)

Review comment:
       Did you compare the performance with one warp and multiple warps?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] ANSHUMAN87 commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

ANSHUMAN87 commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r501955585



##########
File path: src/relay/op/nn/sparse.cc
##########
@@ -76,7 +76,36 @@ TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense")
     });
 
 RELAY_REGISTER_OP("nn.sparse_dense")
-    .describe(R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with X sparse.
+    .describe(R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with W sparse.
+
+- **data**: `(x1, x2, ..., xn, input_dim)`
+- **weight**: `(units, input_dim)`
+- **out**: `(x1, x2, ..., xn, units)`.
+
+)code" TVM_ADD_FILELINE)
+    .set_attrs_type<SparseDenseAttrs>()
+    .set_num_inputs(4)
+    .add_argument("data", "nD Tensor", "Input data.")
+    .add_argument("weight_data", "1D Tensor", "Weight data matrix.")
+    .add_argument("weight_indices", "1D Tensor", "Weight indices matrix.")
+    .add_argument("weight_indptr", "1D Tensor", "Weight indptr matrix.")
+    .set_support_level(1)
+    .add_type_rel("SparseDense", SparseDenseRel);
+
+Expr MakeSparseDensePadded(Expr data, Expr weight_data, Expr weight_indices, Expr weight_indptr) {
+  auto attrs = make_object<SparseDenseAttrs>();
+  static const Op& op = Op::Get("nn.sparse_dense_padded");
+  return Call(op, {data, weight_data, weight_indices, weight_indptr}, Attrs(attrs), {});
+}
+
+TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense_padded")
+    .set_body([](const TVMArgs& args, TVMRetValue* rv) {
+      runtime::detail::unpack_call<Expr, 4>(MakeSparseDensePadded, args, rv);
+    });
+
+RELAY_REGISTER_OP("nn.sparse_dense_padded")

Review comment:
       I am okay with the disclaimer :)
   If it is really not necessary, may be we don't expose this as global Op.
   I am not too sure. Better we get some more opinion on this point.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] jroesch merged pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

jroesch merged pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

tkonolige commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r498364820



##########
File path: src/relay/op/nn/sparse.cc
##########
@@ -82,6 +82,35 @@ RELAY_REGISTER_OP("nn.sparse_dense")
 - **weight**: `(units, input_dim)`
 - **out**: `(x1, x2, ..., xn, units)`.
 
+)code" TVM_ADD_FILELINE)
+    .set_attrs_type<SparseDenseAttrs>()
+    .set_num_inputs(4)
+    .add_argument("data", "nD Tensor", "Input data.")
+    .add_argument("weight_data", "1D Tensor", "Weight data matrix.")
+    .add_argument("weight_indices", "1D Tensor", "Weight indices matrix.")
+    .add_argument("weight_indptr", "1D Tensor", "Weight indptr matrix.")
+    .set_support_level(1)
+    .add_type_rel("SparseDense", SparseDenseRel);
+
+Expr MakeSparseDensePadded(Expr data, Expr weight_data, Expr weight_indices, Expr weight_indptr) {
+  auto attrs = make_object<SparseDenseAttrs>();
+  static const Op& op = Op::Get("nn.sparse_dense_padded");
+  return Call(op, {data, weight_data, weight_indices, weight_indptr}, Attrs(attrs), {});
+}
+
+TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense_padded")
+    .set_body([](const TVMArgs& args, TVMRetValue* rv) {
+      runtime::detail::unpack_call<Expr, 4>(MakeSparseDensePadded, args, rv);
+    });
+
+RELAY_REGISTER_OP("nn.sparse_dense_padded")
+    .describe(
+        R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with X sparse. This variation uses a matrix with row lengths padded to a multiple of 32 for better GPU performance.

Review comment:
       Good catch, fixed it for sparse_dense too.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

tkonolige commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r501958256



##########
File path: src/relay/op/nn/sparse.cc
##########
@@ -76,7 +76,36 @@ TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense")
     });
 
 RELAY_REGISTER_OP("nn.sparse_dense")
-    .describe(R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with X sparse.
+    .describe(R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with W sparse.
+
+- **data**: `(x1, x2, ..., xn, input_dim)`
+- **weight**: `(units, input_dim)`
+- **out**: `(x1, x2, ..., xn, units)`.
+
+)code" TVM_ADD_FILELINE)
+    .set_attrs_type<SparseDenseAttrs>()
+    .set_num_inputs(4)
+    .add_argument("data", "nD Tensor", "Input data.")
+    .add_argument("weight_data", "1D Tensor", "Weight data matrix.")
+    .add_argument("weight_indices", "1D Tensor", "Weight indices matrix.")
+    .add_argument("weight_indptr", "1D Tensor", "Weight indptr matrix.")
+    .set_support_level(1)
+    .add_type_rel("SparseDense", SparseDenseRel);
+
+Expr MakeSparseDensePadded(Expr data, Expr weight_data, Expr weight_indices, Expr weight_indptr) {
+  auto attrs = make_object<SparseDenseAttrs>();
+  static const Op& op = Op::Get("nn.sparse_dense_padded");
+  return Call(op, {data, weight_data, weight_indices, weight_indptr}, Attrs(attrs), {});
+}
+
+TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense_padded")
+    .set_body([](const TVMArgs& args, TVMRetValue* rv) {
+      runtime::detail::unpack_call<Expr, 4>(MakeSparseDensePadded, args, rv);
+    });
+
+RELAY_REGISTER_OP("nn.sparse_dense_padded")

Review comment:
       How can I not expose it? Do I just change the name to "internal.nn.sparse_dense_padded"?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

tkonolige commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r501893329



##########
File path: src/relay/op/nn/sparse.cc
##########
@@ -76,7 +76,36 @@ TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense")
     });
 
 RELAY_REGISTER_OP("nn.sparse_dense")
-    .describe(R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with X sparse.
+    .describe(R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with W sparse.
+
+- **data**: `(x1, x2, ..., xn, input_dim)`
+- **weight**: `(units, input_dim)`
+- **out**: `(x1, x2, ..., xn, units)`.
+
+)code" TVM_ADD_FILELINE)
+    .set_attrs_type<SparseDenseAttrs>()
+    .set_num_inputs(4)
+    .add_argument("data", "nD Tensor", "Input data.")
+    .add_argument("weight_data", "1D Tensor", "Weight data matrix.")
+    .add_argument("weight_indices", "1D Tensor", "Weight indices matrix.")
+    .add_argument("weight_indptr", "1D Tensor", "Weight indptr matrix.")
+    .set_support_level(1)
+    .add_type_rel("SparseDense", SparseDenseRel);
+
+Expr MakeSparseDensePadded(Expr data, Expr weight_data, Expr weight_indices, Expr weight_indptr) {
+  auto attrs = make_object<SparseDenseAttrs>();
+  static const Op& op = Op::Get("nn.sparse_dense_padded");
+  return Call(op, {data, weight_data, weight_indices, weight_indptr}, Attrs(attrs), {});
+}
+
+TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense_padded")
+    .set_body([](const TVMArgs& args, TVMRetValue* rv) {
+      runtime::detail::unpack_call<Expr, 4>(MakeSparseDensePadded, args, rv);
+    });
+
+RELAY_REGISTER_OP("nn.sparse_dense_padded")

Review comment:
       This op shouldn't be used directly by a user unless they are sure that the input matrix is in the correct format. I've documented this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] ANSHUMAN87 commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

ANSHUMAN87 commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r501970880



##########
File path: src/relay/op/nn/sparse.cc
##########
@@ -76,7 +76,36 @@ TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense")
     });
 
 RELAY_REGISTER_OP("nn.sparse_dense")
-    .describe(R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with X sparse.
+    .describe(R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with W sparse.
+
+- **data**: `(x1, x2, ..., xn, input_dim)`
+- **weight**: `(units, input_dim)`
+- **out**: `(x1, x2, ..., xn, units)`.
+
+)code" TVM_ADD_FILELINE)
+    .set_attrs_type<SparseDenseAttrs>()
+    .set_num_inputs(4)
+    .add_argument("data", "nD Tensor", "Input data.")
+    .add_argument("weight_data", "1D Tensor", "Weight data matrix.")
+    .add_argument("weight_indices", "1D Tensor", "Weight indices matrix.")
+    .add_argument("weight_indptr", "1D Tensor", "Weight indptr matrix.")
+    .set_support_level(1)
+    .add_type_rel("SparseDense", SparseDenseRel);
+
+Expr MakeSparseDensePadded(Expr data, Expr weight_data, Expr weight_indices, Expr weight_indptr) {
+  auto attrs = make_object<SparseDenseAttrs>();
+  static const Op& op = Op::Get("nn.sparse_dense_padded");
+  return Call(op, {data, weight_data, weight_indices, weight_indptr}, Attrs(attrs), {});
+}
+
+TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense_padded")
+    .set_body([](const TVMArgs& args, TVMRetValue* rv) {
+      runtime::detail::unpack_call<Expr, 4>(MakeSparseDensePadded, args, rv);
+    });
+
+RELAY_REGISTER_OP("nn.sparse_dense_padded")

Review comment:
       I think "nn.internal.xxx" is a good option. 
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

tkonolige commented on pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#issuecomment-701024445


   @ANSHUMAN87 Right now TVM does not do any testing for performance regressions. The hard part in setting up performance testing is that is varies from run to run and machine to machine.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

tkonolige commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r499745730



##########
File path: python/tvm/tir/ir_builder.py
##########
@@ -74,8 +75,23 @@ def asobject(self):
     def dtype(self):
         return self._content_type
 
+    def _linear_index(self, index):
+        if not isinstance(index, tuple):
+            return index
+        assert len(index) == len(self._shape), "Index size (%s) does not match shape size (%s)" % (
+            len(index),
+            len(self._shape),
+        )
+        dim_size = 1
+        lidx = 0
+        for dim, idx in zip(reversed(self._shape), reversed(index)):
+            lidx += idx * dim_size
+            dim_size *= dim
+        return lidx
+
     def __getitem__(self, index):
         t = DataType(self._content_type)
+        index = self._linear_index(index)

Review comment:
       This allows you to use multidimensional indices with IRBuilder. Manually calculating the correct index into multidimensional arrays caused a couple of bugs when I was developing this code. Using multidimensional indices made everything cleaner and easier to debug.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] ANSHUMAN87 commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

ANSHUMAN87 commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r501884021



##########
File path: src/relay/op/nn/sparse.cc
##########
@@ -76,7 +76,36 @@ TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense")
     });
 
 RELAY_REGISTER_OP("nn.sparse_dense")
-    .describe(R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with X sparse.
+    .describe(R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with W sparse.
+
+- **data**: `(x1, x2, ..., xn, input_dim)`
+- **weight**: `(units, input_dim)`
+- **out**: `(x1, x2, ..., xn, units)`.
+
+)code" TVM_ADD_FILELINE)
+    .set_attrs_type<SparseDenseAttrs>()
+    .set_num_inputs(4)
+    .add_argument("data", "nD Tensor", "Input data.")
+    .add_argument("weight_data", "1D Tensor", "Weight data matrix.")
+    .add_argument("weight_indices", "1D Tensor", "Weight indices matrix.")
+    .add_argument("weight_indptr", "1D Tensor", "Weight indptr matrix.")
+    .set_support_level(1)
+    .add_type_rel("SparseDense", SparseDenseRel);
+
+Expr MakeSparseDensePadded(Expr data, Expr weight_data, Expr weight_indices, Expr weight_indptr) {
+  auto attrs = make_object<SparseDenseAttrs>();
+  static const Op& op = Op::Get("nn.sparse_dense_padded");
+  return Call(op, {data, weight_data, weight_indices, weight_indptr}, Attrs(attrs), {});
+}
+
+TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense_padded")
+    .set_body([](const TVMArgs& args, TVMRetValue* rv) {
+      runtime::detail::unpack_call<Expr, 4>(MakeSparseDensePadded, args, rv);
+    });
+
+RELAY_REGISTER_OP("nn.sparse_dense_padded")

Review comment:
       In which case this Op will be used independently? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

tkonolige commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r501889549



##########
File path: python/tvm/topi/nn/sparse.py
##########
@@ -207,3 +207,28 @@ def _csr_transpose_ir(data, indices, indptr, out_data, out_indices, out_indptr):
         last[0] = temp2[0]
 
     return irb.get()
+
+
+@tvm.target.generic_func
+def sparse_dense_alter_layout(_attrs, _inputs, _tinfos, _out_type):

Review comment:
       Pylint complains these inputs are unused. Prefixing them with _ silences the warning and indicates to other people that the argument is unused.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] ANSHUMAN87 commented on pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

ANSHUMAN87 commented on pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#issuecomment-706651881


   > @tkonolige @tqchen This commits fails in the master branch. see the CI: https://github.com/apache/incubator-tvm/commits/master
   > 
   > It introduces a flaky test that blocks two of my PRs.
   
   #6658 has resolved the issue i think.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

tkonolige commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r499018751



##########
File path: python/tvm/topi/cuda/sparse.py
##########
@@ -97,3 +94,287 @@ def _callback(op):
 
     traverse_inline(s, outs[0].op, _callback)
     return s
+
+
+def schedule_cuda_transpose(s, out):
+    """Schedule for transpose on the gpu.
+
+    Roughly follows this:
+    https://developer.nvidia.com/blog/efficient-matrix-transpose-cuda-cc/, but
+    without the padding for shared memory. For better performance, we could
+    rewrite it in tir to add the padding.
+    """
+
+    def _callback(op):
+        # pylint: disable=invalid-name
+        m, n = s[op].op.axis
+        warp_size = int(tvm.target.Target.current(allow_none=False).thread_warp_size)

Review comment:
       Yes, 4 warps per block is slightly faster than 1 warp per block. I've added a comment to this effect.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

tkonolige commented on pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#issuecomment-706658796


   @merrymercy I think that was the diagnostics


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] ANSHUMAN87 commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

ANSHUMAN87 commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r501082471



##########
File path: python/tvm/tir/ir_builder.py
##########
@@ -62,9 +62,10 @@ class BufferVar(ObjectGeneric):
     IRBuilder.allocate
     """
 
-    def __init__(self, builder, buffer_var, content_type):
+    def __init__(self, builder, buffer_var, shape, content_type):

Review comment:
       This is change okay. Post clarification!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] ANSHUMAN87 commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

ANSHUMAN87 commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r501947855



##########
File path: python/tvm/topi/nn/sparse.py
##########
@@ -207,3 +207,28 @@ def _csr_transpose_ir(data, indices, indptr, out_data, out_indices, out_indptr):
         last[0] = temp2[0]
 
     return irb.get()
+
+
+@tvm.target.generic_func
+def sparse_dense_alter_layout(_attrs, _inputs, _tinfos, _out_type):

Review comment:
       Ok, got it. Thanks for clarifying!
   We can also use macro like below 
     #pylint: disable=
   def func():




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] mkstarr commented on pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

mkstarr commented on pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#issuecomment-701185298


   Have you considered your syntax errors?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] ANSHUMAN87 commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

ANSHUMAN87 commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r501864035



##########
File path: python/tvm/tir/ir_builder.py
##########
@@ -62,9 +62,10 @@ class BufferVar(ObjectGeneric):
     IRBuilder.allocate
     """
 
-    def __init__(self, builder, buffer_var, content_type):
+    def __init__(self, builder, buffer_var, shape, content_type):

Review comment:
       Request to add small comment section to describe new way to use it.
   As the new cases are 2, either with already linear index or actual index. It will help understand the new change better.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] vinx13 commented on a change in pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

vinx13 commented on a change in pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#discussion_r497701322



##########
File path: tests/python/topi/python/test_topi_sparse.py
##########
@@ -342,38 +343,29 @@ def verify_sparse_dense_bsr(M, N, K, BS_R, BS_C, density, use_relu):
     W_indptr = te.placeholder(shape=W_sp_np.indptr.shape, dtype=str(W_sp_np.indptr.dtype))
     X = te.placeholder(shape=X_np.shape, dtype=str(X_np.dtype))
 
-    def check_device(device):
-        ctx = tvm.context(device, 0)
-        if not tvm.testing.device_enabled(device):
-            print("Skip because %s is not enabled" % device)
-            return
-        print("Running on target: %s" % device)
-        fcompute, fschedule = tvm.topi.testing.dispatch(device, _sparse_dense_implement)
-        with tvm.target.Target(device):
-            Y = fcompute(X, W_data, W_indices, W_indptr)
-            if use_relu:
-                Y = topi.nn.relu(Y)
-            s = fschedule([Y])
-            func = tvm.build(s, [X, W_data, W_indices, W_indptr, Y])
-            Y_tvm = tvm.nd.array(np.zeros(Y_np.shape, dtype=Y_np.dtype), ctx=ctx)
-            func(
-                tvm.nd.array(X_np, ctx=ctx),
-                tvm.nd.array(W_sp_np.data, ctx=ctx),
-                tvm.nd.array(W_sp_np.indices, ctx=ctx),
-                tvm.nd.array(W_sp_np.indptr, ctx=ctx),
-                Y_tvm,
-            )
-            tvm.testing.assert_allclose(Y_tvm.asnumpy(), Y_np, atol=1e-4, rtol=1e-4)
-
-    for device in ["llvm", "cuda"]:
-        check_device(device)
+    fcompute, fschedule = tvm.topi.testing.dispatch(target, _sparse_dense_implement)
+    with tvm.target.Target(target):
+        Y = fcompute(X, W_data, W_indices, W_indptr)
+        if use_relu:
+            Y = topi.nn.relu(Y)
+        s = fschedule([Y])
+        func = tvm.build(s, [X, W_data, W_indices, W_indptr, Y])
+        Y_tvm = tvm.nd.array(np.zeros(Y_np.shape, dtype=Y_np.dtype), ctx=ctx)
+        func(
+            tvm.nd.array(X_np, ctx=ctx),
+            tvm.nd.array(W_sp_np.data, ctx=ctx),
+            tvm.nd.array(W_sp_np.indices, ctx=ctx),
+            tvm.nd.array(W_sp_np.indptr, ctx=ctx),
+            Y_tvm,
+        )
+        tvm.testing.assert_allclose(Y_tvm.asnumpy(), Y_np, atol=1e-4, rtol=1e-4)
 
 
-@tvm.testing.uses_gpu
-def test_sparse_dense_bsr():
+@tvm.testing.parametrize_targets("llvm", "cuda")
+def test_sparse_dense_bsr_relu(ctx, target):

Review comment:
       enable this test?

##########
File path: src/relay/op/nn/sparse.cc
##########
@@ -82,6 +82,35 @@ RELAY_REGISTER_OP("nn.sparse_dense")
 - **weight**: `(units, input_dim)`
 - **out**: `(x1, x2, ..., xn, units)`.
 
+)code" TVM_ADD_FILELINE)
+    .set_attrs_type<SparseDenseAttrs>()
+    .set_num_inputs(4)
+    .add_argument("data", "nD Tensor", "Input data.")
+    .add_argument("weight_data", "1D Tensor", "Weight data matrix.")
+    .add_argument("weight_indices", "1D Tensor", "Weight indices matrix.")
+    .add_argument("weight_indptr", "1D Tensor", "Weight indptr matrix.")
+    .set_support_level(1)
+    .add_type_rel("SparseDense", SparseDenseRel);
+
+Expr MakeSparseDensePadded(Expr data, Expr weight_data, Expr weight_indices, Expr weight_indptr) {
+  auto attrs = make_object<SparseDenseAttrs>();
+  static const Op& op = Op::Get("nn.sparse_dense_padded");
+  return Call(op, {data, weight_data, weight_indices, weight_indptr}, Attrs(attrs), {});
+}
+
+TVM_REGISTER_GLOBAL("relay.op.nn._make.sparse_dense_padded")
+    .set_body([](const TVMArgs& args, TVMRetValue* rv) {
+      runtime::detail::unpack_call<Expr, 4>(MakeSparseDensePadded, args, rv);
+    });
+
+RELAY_REGISTER_OP("nn.sparse_dense_padded")
+    .describe(
+        R"code(Applies a sparse linear transformation: :math:`Y = XW^T` with X sparse. This variation uses a matrix with row lengths padded to a multiple of 32 for better GPU performance.

Review comment:
       with W sparse?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on pull request #6580: Faster sparse_dense on GPUs

Posted by GitBox <gi...@apache.org>.

tkonolige commented on pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#issuecomment-701490576


   @vinx13 @antinucleon @Laurawly @jwfromm @ajtulloch I think this is ready for review.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org