You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/06/10 07:05:41 UTC

[GitHub] [tvm] jcf94 opened a new pull request #8234: [Matmul] Add an matmul op for Relay matmul op

jcf94 opened a new pull request #8234:
URL: https://github.com/apache/tvm/pull/8234


   Thanks for contributing to TVM!   Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from [Reviewers](https://github.com/apache/incubator-tvm/blob/master/CONTRIBUTORS.md#reviewers) by @ them in the pull request thread.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jcf94 commented on pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
jcf94 commented on pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#issuecomment-870515314


   Thanks! @comaniac @tkonolige  Most comments have been addressed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jcf94 commented on a change in pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
jcf94 commented on a change in pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#discussion_r660225637



##########
File path: python/tvm/relay/op/nn/nn.py
##########
@@ -1471,6 +1471,47 @@ def bias_add(data, bias, axis=1):
     return _make.bias_add(data, bias, axis)
 
 
+def matmul(data, weight, units=None, out_dtype="", data_transposed=False, weight_transposed=False):
+    """Matmul operator.
+    Applies a linear transformation. The X & W can be transposed.
+
+    .. math::
+
+        `Y = X * W`
+
+    Parameters
+    ----------
+    data : tvm.relay.Expr
+        The input data to the operator,
+        of shape `(d_1, d_2, ..., d_n, units_in)` or `(d_1, d_2, ..., units_in, d_n)`.

Review comment:
       No, the input of matmul is supposed to be a multiple-dim tensor(not limited to 2). This is copied from the original `nn.dense`.
   
   Other frameworks like Pytorch also has such definition.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jcf94 merged pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
jcf94 merged pull request #8234:
URL: https://github.com/apache/tvm/pull/8234


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] comaniac commented on a change in pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
comaniac commented on a change in pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#discussion_r649373782



##########
File path: python/tvm/relay/op/nn/nn.py
##########
@@ -1471,7 +1471,46 @@ def bias_add(data, bias, axis=1):
     return _make.bias_add(data, bias, axis)
 
 
-def dense(data, weight, units=None, out_dtype=""):
+def matmul(data, weight, units=None, out_dtype="", data_transposed=False, weight_transposed=False):
+    """Dense operator.
+    Applies a linear transformation. The X & W can be transposed.
+
+    .. math::
+
+    `Y = X * W`
+
+    Parameters
+    ----------
+    data : tvm.relay.Expr
+        The input data to the operator,
+        of shape `(d_1, d_2, ..., d_n, units_in)` or `(d_1, d_2, ..., units_in, d_n)`.
+
+    weight : tvm.relay.Expr
+        The weight expressions, 2-D matrix,
+        of shape `(units_in, units)` or `(units, units_in)`.
+
+    units : int, optional
+        Number of hidden units of the dense transformation.
+
+    out_dtype : str, optional
+        Specifies the output data type for mixed precision dense,
+        of shape `(d_1, d_2, ..., d_n, units)`.
+
+    data_transposed : bool, optional
+        Whether the data tensor is in transposed format.
+
+    weight_transposed : bool, optional
+        Whether the weight tensor is in transposed format.
+
+    Returns
+    -------
+    result : tvm.relay.Expr
+        The computed result.
+    """
+    return _make.matmul(data, weight, units, out_dtype, data_transposed, weight_transposed)
+
+
+def dense(data, weight, units=None, out_dtype="", data_transposed=False, weight_transposed=True):

Review comment:
       It's weird to have transpose arguments in dense. This op should enforce transposed weight. Ideally, the implementation of dense in Relay level should be:
   
   ```
   return _make.matmul(data, weight, units, out_dtype, False, True)
   ```

##########
File path: python/tvm/relay/op/nn/nn.py
##########
@@ -1471,7 +1471,46 @@ def bias_add(data, bias, axis=1):
     return _make.bias_add(data, bias, axis)
 
 
-def dense(data, weight, units=None, out_dtype=""):
+def matmul(data, weight, units=None, out_dtype="", data_transposed=False, weight_transposed=False):
+    """Dense operator.

Review comment:
       ```suggestion
       """Matmul operator.
   ```

##########
File path: python/tvm/relay/op/strategy/cuda.py
##########
@@ -698,6 +698,26 @@ def conv1d_transpose_strategy_cuda(attrs, inputs, out_type, target):
     return strategy
 
 
+@matmul_strategy.register(["cuda", "gpu"])
+def matmul_strategy_cuda(attrs, inputs, out_type, target):

Review comment:
       it's highly possible that this function generates no strategy. We need a checker to throw an error as well as the workaround for this situation.

##########
File path: python/tvm/topi/nn/dense.py
##########
@@ -51,37 +65,120 @@ def dense(data, weight, bias=None, out_dtype=None, auto_scheduler_rewritten_layo
         assert len(bias.shape) == 1
     if out_dtype is None:
         out_dtype = data.dtype
-    batch, in_dim = data.shape
+    if data_transposed:
+        in_dim, batch = data.shape
+    else:
+        batch, in_dim = data.shape
 
     if auto_scheduler_rewritten_layout:
         # Infer shape for the rewritten layout
         out_dim, red_dim = auto_scheduler.get_shape_from_rewritten_layout(
-            auto_scheduler_rewritten_layout, ["j", "k"]
+            auto_scheduler_rewritten_layout, ["j", "k"] if weight_transposed else ["k", "j"]
         )
         auto_scheduler.remove_index_check(weight)
-    else:
+    elif weight_transposed:
         out_dim, red_dim = weight.shape
+    else:
+        red_dim, out_dim = weight.shape
     assert in_dim == red_dim
 
     k = te.reduce_axis((0, in_dim), name="k")
-    matmul = te.compute(
+    if data_transposed:

Review comment:
       nit: This might be clearer:
   
   ```
   if (data_transposed, weight_transposed) == (False, False):
       # ...
   elif (data_transposed, weight_transposed) == (False, True):
       # ... 
   elif (data_transposed, weight_transposed) == (True, False):
       # ...
   elif (data_transposed, weight_transposed) == (True, True):
       # ...
   ```
   
   Also I think it's fine to just use `T_matmul_NT` instead of `T_dense`.

##########
File path: python/tvm/relay/op/nn/nn.py
##########
@@ -1471,7 +1471,46 @@ def bias_add(data, bias, axis=1):
     return _make.bias_add(data, bias, axis)
 
 
-def dense(data, weight, units=None, out_dtype=""):
+def matmul(data, weight, units=None, out_dtype="", data_transposed=False, weight_transposed=False):
+    """Dense operator.
+    Applies a linear transformation. The X & W can be transposed.
+
+    .. math::
+
+    `Y = X * W`
+
+    Parameters
+    ----------
+    data : tvm.relay.Expr
+        The input data to the operator,
+        of shape `(d_1, d_2, ..., d_n, units_in)` or `(d_1, d_2, ..., units_in, d_n)`.
+
+    weight : tvm.relay.Expr
+        The weight expressions, 2-D matrix,
+        of shape `(units_in, units)` or `(units, units_in)`.
+
+    units : int, optional
+        Number of hidden units of the dense transformation.
+
+    out_dtype : str, optional

Review comment:
       Default?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] tkonolige commented on pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
tkonolige commented on pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#issuecomment-858753090


   To clarify: you are replacing topi dense implementations with the new matmul one. But you are leaving `nn.dense` as relay op and adding another relay op called `nn.matmul`. Why not remove the `nn.dense` op?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] junrushao1994 commented on pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#issuecomment-858745155


   CC @altanh @tkonolige 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jcf94 edited a comment on pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
jcf94 edited a comment on pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#issuecomment-868938741


   Hi @junrushao1994 @tkonolige @comaniac @altanh I've tried several ways to remove the `nn.dense`, but it results on the modifications among nearly 70 files and some strange CI errors.
   
   Currently I think its still better to keep the `nn.dense` and `nn.matmul`. We can gradually remove the use of `nn.dense` in the future and finally deprecate it.
   
   Also cc @tqchen @FrozenGene 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jcf94 commented on a change in pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
jcf94 commented on a change in pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#discussion_r661088809



##########
File path: python/tvm/relay/op/strategy/x86.py
##########
@@ -370,6 +370,55 @@ def conv1d_strategy_cpu(attrs, inputs, out_type, target):
     return strategy
 
 
+@matmul_strategy.register("cpu")
+def matmul_strategy_cpu(attrs, inputs, out_type, target):
+    """matmul x86 strategy"""
+    strategy = _op.OpStrategy()
+    if is_auto_scheduler_enabled():
+        strategy.add_implementation(
+            wrap_compute_matmul(topi.nn.matmul, need_auto_scheduler_layout=True),
+            naive_schedule,
+            name="matmul.generic",
+            plevel=11,
+        )
+    else:
+        logger.warning("Matmul other than NT format is not optimized for x86.")
+        strategy.add_implementation(
+            wrap_compute_matmul(topi.nn.matmul),
+            naive_schedule,
+            name="matmul.generic",
+        )
+
+    same_type = inputs[0].dtype == inputs[1].dtype == out_type.dtype
+    dtype = inputs[0].dtype
+    u8s8s32 = dtype == "uint8" and inputs[1].dtype == "int8" and out_type.dtype == "int32"
+    if "cblas" in target.libs:

Review comment:
       Great, warning log added.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jcf94 commented on pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
jcf94 commented on pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#issuecomment-871053943


   Thanks! @comaniac @tkonolige
   I'll add a topi compute support for multi-dim tensor input in another PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jcf94 commented on pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
jcf94 commented on pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#issuecomment-859215400


   @tkonolige @comaniac @altanh Thanks!
   
   Except for these nit comments about name and coding style, I think currently the problem is still how to keep the existing `nn.dense` schedules for different devices working while adding a new `nn.matmul` op.
   
   @comaniac 's solution is exactly what I desired.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jcf94 commented on a change in pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
jcf94 commented on a change in pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#discussion_r660226675



##########
File path: python/tvm/relay/op/nn/nn.py
##########
@@ -1471,6 +1471,47 @@ def bias_add(data, bias, axis=1):
     return _make.bias_add(data, bias, axis)
 
 
+def matmul(data, weight, units=None, out_dtype="", data_transposed=False, weight_transposed=False):
+    """Matmul operator.
+    Applies a linear transformation. The X & W can be transposed.
+
+    .. math::
+
+        `Y = X * W`
+
+    Parameters
+    ----------
+    data : tvm.relay.Expr
+        The input data to the operator,
+        of shape `(d_1, d_2, ..., d_n, units_in)` or `(d_1, d_2, ..., units_in, d_n)`.
+
+    weight : tvm.relay.Expr
+        The weight expressions, 2-D matrix,
+        of shape `(units_in, units)` or `(units, units_in)`.
+
+    units : int, optional
+        Number of hidden units of the matmul transformation.

Review comment:
       I think the doc has explained enough: "The hidden units." This is copied from the original `nn.dense`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] tkonolige commented on a change in pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
tkonolige commented on a change in pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#discussion_r660188664



##########
File path: include/tvm/relay/attrs/nn.h
##########
@@ -961,6 +961,32 @@ struct AvgPool3DAttrs : public tvm::AttrsNode<AvgPool3DAttrs> {
   }
 };
 
+/*! \brief Attributes for matmul operator */
+struct MatmulAttrs : public tvm::AttrsNode<MatmulAttrs> {
+  IndexExpr units;
+  DataType out_dtype;
+  bool data_transposed;
+  bool weight_transposed;
+  tvm::String auto_scheduler_rewritten_layout;  // The layout after auto-scheduler's layout rewrite

Review comment:
       Why is this field necessary?

##########
File path: python/tvm/relay/frontend/tensorflow.py
##########
@@ -1204,7 +1208,7 @@ def from_tensorflow(self, graph, layout="NHWC", shape=None, outputs=None):
         return func, self._params
 
 
-def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None):
+def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None, use_dense_op=True):

Review comment:
       I don't think we should have a flag here. We should just commit to one codepath.

##########
File path: python/tvm/relay/op/_tensor_grad.py
##########
@@ -554,6 +554,35 @@ def dense_grad(orig, grad):
     ]
 
 
+@register_gradient("nn.matmul")
+def matmul_grad(orig, grad):
+    """Returns [grad' @ weight, data @ grad']"""
+    data, weight = orig.args
+    if (orig.attrs["data_transposed"], orig.attrs["weight_transposed"]) == (True, True):

Review comment:
       Please refactor this to not if/else on every possible combination of transpose.

##########
File path: python/tvm/relay/op/nn/nn.py
##########
@@ -1471,6 +1471,47 @@ def bias_add(data, bias, axis=1):
     return _make.bias_add(data, bias, axis)
 
 
+def matmul(data, weight, units=None, out_dtype="", data_transposed=False, weight_transposed=False):
+    """Matmul operator.
+    Applies a linear transformation. The X & W can be transposed.
+
+    .. math::
+
+        `Y = X * W`
+
+    Parameters
+    ----------
+    data : tvm.relay.Expr
+        The input data to the operator,
+        of shape `(d_1, d_2, ..., d_n, units_in)` or `(d_1, d_2, ..., units_in, d_n)`.
+
+    weight : tvm.relay.Expr
+        The weight expressions, 2-D matrix,
+        of shape `(units_in, units)` or `(units, units_in)`.
+
+    units : int, optional
+        Number of hidden units of the matmul transformation.

Review comment:
       What is a unit?

##########
File path: python/tvm/relay/op/nn/nn.py
##########
@@ -1471,6 +1471,47 @@ def bias_add(data, bias, axis=1):
     return _make.bias_add(data, bias, axis)
 
 
+def matmul(data, weight, units=None, out_dtype="", data_transposed=False, weight_transposed=False):
+    """Matmul operator.
+    Applies a linear transformation. The X & W can be transposed.
+
+    .. math::
+
+        `Y = X * W`
+
+    Parameters
+    ----------
+    data : tvm.relay.Expr
+        The input data to the operator,
+        of shape `(d_1, d_2, ..., d_n, units_in)` or `(d_1, d_2, ..., units_in, d_n)`.

Review comment:
       Shouldn't both input shapes by dimension 2?

##########
File path: python/tvm/relay/op/nn/_nn.py
##########
@@ -1160,21 +1186,46 @@ def batch_flatten_shape_func(attrs, inputs, _):
 
 
 @script
-def _dense_shape_func(data_shape, weight_shape):
+def _matmul_shape_func(data_shape, weight_shape, data_transposed, weight_transposed):
     out = output_tensor((data_shape.shape[0],), "int64")
     for i in const_range(out.shape[0] - 1):
         out[i] = data_shape[i]
-    out[out.shape[0] - 1] = weight_shape[0]
+    if data_transposed:
+        out[out.shape[0] - 2] = out[out.shape[0] - 1]
+    out[out.shape[0] - 1] = weight_shape[0] if weight_transposed else weight_shape[1]

Review comment:
       This seems really complicated. Shouldn't it just be some part of data_shape and weight_shape depending on the transposes?

##########
File path: python/tvm/topi/nn/dense.py
##########
@@ -38,6 +46,12 @@ def dense(data, weight, bias=None, out_dtype=None, auto_scheduler_rewritten_layo
     out_dtype : Optional[str]
         The output type. This is used for mixed precision.
 
+    data_transposed : Optional[bool]

Review comment:
       Add the default values to this




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] tkonolige commented on a change in pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
tkonolige commented on a change in pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#discussion_r660188664



##########
File path: include/tvm/relay/attrs/nn.h
##########
@@ -961,6 +961,32 @@ struct AvgPool3DAttrs : public tvm::AttrsNode<AvgPool3DAttrs> {
   }
 };
 
+/*! \brief Attributes for matmul operator */
+struct MatmulAttrs : public tvm::AttrsNode<MatmulAttrs> {
+  IndexExpr units;
+  DataType out_dtype;
+  bool data_transposed;
+  bool weight_transposed;
+  tvm::String auto_scheduler_rewritten_layout;  // The layout after auto-scheduler's layout rewrite

Review comment:
       Why is this field necessary?

##########
File path: python/tvm/relay/frontend/tensorflow.py
##########
@@ -1204,7 +1208,7 @@ def from_tensorflow(self, graph, layout="NHWC", shape=None, outputs=None):
         return func, self._params
 
 
-def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None):
+def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None, use_dense_op=True):

Review comment:
       I don't think we should have a flag here. We should just commit to one codepath.

##########
File path: python/tvm/relay/op/_tensor_grad.py
##########
@@ -554,6 +554,35 @@ def dense_grad(orig, grad):
     ]
 
 
+@register_gradient("nn.matmul")
+def matmul_grad(orig, grad):
+    """Returns [grad' @ weight, data @ grad']"""
+    data, weight = orig.args
+    if (orig.attrs["data_transposed"], orig.attrs["weight_transposed"]) == (True, True):

Review comment:
       Please refactor this to not if/else on every possible combination of transpose.

##########
File path: python/tvm/relay/op/nn/nn.py
##########
@@ -1471,6 +1471,47 @@ def bias_add(data, bias, axis=1):
     return _make.bias_add(data, bias, axis)
 
 
+def matmul(data, weight, units=None, out_dtype="", data_transposed=False, weight_transposed=False):
+    """Matmul operator.
+    Applies a linear transformation. The X & W can be transposed.
+
+    .. math::
+
+        `Y = X * W`
+
+    Parameters
+    ----------
+    data : tvm.relay.Expr
+        The input data to the operator,
+        of shape `(d_1, d_2, ..., d_n, units_in)` or `(d_1, d_2, ..., units_in, d_n)`.
+
+    weight : tvm.relay.Expr
+        The weight expressions, 2-D matrix,
+        of shape `(units_in, units)` or `(units, units_in)`.
+
+    units : int, optional
+        Number of hidden units of the matmul transformation.

Review comment:
       What is a unit?

##########
File path: python/tvm/relay/op/nn/nn.py
##########
@@ -1471,6 +1471,47 @@ def bias_add(data, bias, axis=1):
     return _make.bias_add(data, bias, axis)
 
 
+def matmul(data, weight, units=None, out_dtype="", data_transposed=False, weight_transposed=False):
+    """Matmul operator.
+    Applies a linear transformation. The X & W can be transposed.
+
+    .. math::
+
+        `Y = X * W`
+
+    Parameters
+    ----------
+    data : tvm.relay.Expr
+        The input data to the operator,
+        of shape `(d_1, d_2, ..., d_n, units_in)` or `(d_1, d_2, ..., units_in, d_n)`.

Review comment:
       Shouldn't both input shapes by dimension 2?

##########
File path: python/tvm/relay/op/nn/_nn.py
##########
@@ -1160,21 +1186,46 @@ def batch_flatten_shape_func(attrs, inputs, _):
 
 
 @script
-def _dense_shape_func(data_shape, weight_shape):
+def _matmul_shape_func(data_shape, weight_shape, data_transposed, weight_transposed):
     out = output_tensor((data_shape.shape[0],), "int64")
     for i in const_range(out.shape[0] - 1):
         out[i] = data_shape[i]
-    out[out.shape[0] - 1] = weight_shape[0]
+    if data_transposed:
+        out[out.shape[0] - 2] = out[out.shape[0] - 1]
+    out[out.shape[0] - 1] = weight_shape[0] if weight_transposed else weight_shape[1]

Review comment:
       This seems really complicated. Shouldn't it just be some part of data_shape and weight_shape depending on the transposes?

##########
File path: python/tvm/topi/nn/dense.py
##########
@@ -38,6 +46,12 @@ def dense(data, weight, bias=None, out_dtype=None, auto_scheduler_rewritten_layo
     out_dtype : Optional[str]
         The output type. This is used for mixed precision.
 
+    data_transposed : Optional[bool]

Review comment:
       Add the default values to this




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] tqchen edited a comment on pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
tqchen edited a comment on pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#issuecomment-873407694


   @jcf94 There  a few regressions in PyTorch frontend and other places that might be caused by the PR, please followup with a fix. Specifically, dense originally maps to dot(A, B.T) instead


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] tqchen commented on pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
tqchen commented on pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#issuecomment-873407694


   @jcf94 This PR could cause a few regressions in PyTorch frontend and other places, please followup with a fix. Specifically, dense originally maps to dot(A, B.T) instead


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] comaniac commented on a change in pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
comaniac commented on a change in pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#discussion_r659979731



##########
File path: python/tvm/relay/op/nn/_nn.py
##########
@@ -1160,21 +1186,46 @@ def batch_flatten_shape_func(attrs, inputs, _):
 
 
 @script
-def _dense_shape_func(data_shape, weight_shape):
+def _matmul_shape_func(data_shape, weight_shape, data_transposed, weight_transposed):

Review comment:
       nit: the two inputs of matmul are not necessary to be `data` and `weight`, although it's almost right in DNN. We may consider calling them `transpose_a` and `transpose_b` as CuDNN.

##########
File path: python/tvm/relay/frontend/tensorflow.py
##########
@@ -44,6 +44,10 @@
 
 __all__ = ["from_tensorflow"]
 
+# By default, TVM convert `tf.matmul` to `nn.dense` op with data tensor non-transposed and weight
+# tensor transposed
+_USE_DENSE_INSTEAD_OF_MATMUL = True

Review comment:
       ```suggestion
   # The default configurations of Relay TensorFlow frontend.
   TF_DEFAULT_CONFIGS = {
     # By default, TVM converts `tf.matmul` to `transpose(weight) + nn.dense`, which introduces
     # unnecessary overhead in weight transpose. Change this flag to False to directly convert to
     #`nn.matmul` to get rid of the overhead. However, please note that `nn.matmul` is in experimental
     # so it may have some performance issues.
     "use_dense": True,
   }
   ```
   
   I reviewed the primary entrypoint and feel we may need to make it more general, as we may have other configurations in the future. Also I used the name "default" to imply that it may be overwritten by user-specified values.

##########
File path: python/tvm/topi/x86/dense.py
##########
@@ -281,72 +281,121 @@ def _callback(op):
     return s
 
 
-def dense_blas_common(cfg, data, weight, bias, out_dtype, lib):
-    """Compute dense using a BLAS library"""
+def matmul_blas_common(cfg, data, weight, bias, out_dtype, data_transposed, weight_transposed, lib):
+    """Compute matmul/dense using a BLAS library"""
     M, K = get_const_tuple(data.shape)
     N, _ = get_const_tuple(weight.shape)
     if isinstance(M, int) and isinstance(K, int) and isinstance(N, int):
         cfg.add_flop(M * K * N * 2)
     if data.dtype == "uint8" and weight.dtype == "int8" and out_dtype == "int32":
         if not hasattr(lib, "matmul_u8s8s32"):
             raise NotImplementedError(
-                f"Dense with {lib.__name__} for {data.dtype} is not supported "
+                f"Matmul/Dense with {lib.__name__} for {data.dtype} is not supported "
                 "(matmulu8s8s32 not imlemented)"
             )
-        C = lib.matmul_u8s8s32(data, weight, False, True, dtype=out_dtype)
+        C = lib.matmul_u8s8s32(data, weight, data_transposed, weight_transposed, dtype=out_dtype)
     elif data.dtype == "float32" or data.dtype == "float64":
-        C = lib.matmul(data, weight, False, True)
+        C = lib.matmul(data, weight, data_transposed, weight_transposed)
     else:
-        raise NotImplementedError(f"Dense with {lib.__name__} for {data.dtype} is not supported")
+        raise NotImplementedError(
+            f"Matmul/Dense with {lib.__name__} for {data.dtype} is not supported"
+        )
 
     if bias is not None:
         C = te.compute(C.shape, lambda i, j: C[i, j] + bias[j].astype(out_dtype), tag=tag.BROADCAST)
     return C
 
 
+def schedule_matmul_blas_common(outs):
+    """Default matmul schedule for BLAS library"""
+    s = te.create_schedule([x.op for x in outs])
+    te.schedule.AutoInlineInjective(s)
+
+    for out in outs:
+        if "dense" not in out.op.tag and "matmul" not in out.op.tag:
+            schedule_injective_from_existing(s, out)
+    return s
+
+
 @autotvm.register_topi_compute("dense_cblas.x86")
 def dense_cblas(cfg, data, weight, bias=None, out_dtype=None):
     """Compute dense using a cblas"""

Review comment:
       ditto

##########
File path: python/tvm/topi/x86/dense.py
##########
@@ -281,72 +281,121 @@ def _callback(op):
     return s
 
 
-def dense_blas_common(cfg, data, weight, bias, out_dtype, lib):
-    """Compute dense using a BLAS library"""
+def matmul_blas_common(cfg, data, weight, bias, out_dtype, data_transposed, weight_transposed, lib):
+    """Compute matmul/dense using a BLAS library"""
     M, K = get_const_tuple(data.shape)
     N, _ = get_const_tuple(weight.shape)
     if isinstance(M, int) and isinstance(K, int) and isinstance(N, int):
         cfg.add_flop(M * K * N * 2)
     if data.dtype == "uint8" and weight.dtype == "int8" and out_dtype == "int32":
         if not hasattr(lib, "matmul_u8s8s32"):
             raise NotImplementedError(
-                f"Dense with {lib.__name__} for {data.dtype} is not supported "
+                f"Matmul/Dense with {lib.__name__} for {data.dtype} is not supported "
                 "(matmulu8s8s32 not imlemented)"
             )
-        C = lib.matmul_u8s8s32(data, weight, False, True, dtype=out_dtype)
+        C = lib.matmul_u8s8s32(data, weight, data_transposed, weight_transposed, dtype=out_dtype)
     elif data.dtype == "float32" or data.dtype == "float64":
-        C = lib.matmul(data, weight, False, True)
+        C = lib.matmul(data, weight, data_transposed, weight_transposed)
     else:
-        raise NotImplementedError(f"Dense with {lib.__name__} for {data.dtype} is not supported")
+        raise NotImplementedError(
+            f"Matmul/Dense with {lib.__name__} for {data.dtype} is not supported"
+        )
 
     if bias is not None:
         C = te.compute(C.shape, lambda i, j: C[i, j] + bias[j].astype(out_dtype), tag=tag.BROADCAST)
     return C
 
 
+def schedule_matmul_blas_common(outs):
+    """Default matmul schedule for BLAS library"""
+    s = te.create_schedule([x.op for x in outs])
+    te.schedule.AutoInlineInjective(s)
+
+    for out in outs:
+        if "dense" not in out.op.tag and "matmul" not in out.op.tag:
+            schedule_injective_from_existing(s, out)
+    return s
+
+
 @autotvm.register_topi_compute("dense_cblas.x86")
 def dense_cblas(cfg, data, weight, bias=None, out_dtype=None):
     """Compute dense using a cblas"""
-    return dense_blas_common(cfg, data, weight, bias, out_dtype, cblas)
+    return matmul_blas_common(cfg, data, weight, bias, out_dtype, False, True, cblas)
 
 
 @autotvm.register_topi_schedule("dense_cblas.x86")
 def schedule_dense_cblas(_, outs):
     """Create schedule for dense_cblas"""
-    return generic.schedule_extern(outs)
+    return schedule_matmul_blas_common(outs)
 
 
 @autotvm.register_topi_compute("dense_mkl.x86")
 def dense_mkl(cfg, data, weight, bias=None, out_dtype=None):
     """Compute dense using mkl"""
-    return dense_blas_common(cfg, data, weight, bias, out_dtype, mkl)
+    return matmul_blas_common(cfg, data, weight, bias, out_dtype, False, True, mkl)
 
 
 @autotvm.register_topi_schedule("dense_mkl.x86")
 def schedule_dense_mkl(_, outs):
     """Create schedule for dense_mkl"""
-    # return generic.schedule_extern(outs)
-    s = te.create_schedule([x.op for x in outs])
-    te.schedule.AutoInlineInjective(s)
-
-    def _callback(op):
-        if "broadcast" in op.tag or "injective" in op.tag or "elemwise" in op.tag:
-            schedule_injective_from_existing(s, op.output(0))
-
-    # traverse_inline(s, outs[0].op, _callback)
-    for out in outs:
-        if "dense" not in out.op.name:
-            schedule_injective_from_existing(s, out)
-    return s
+    return schedule_matmul_blas_common(outs)
 
 
 @autotvm.register_topi_compute("dense_mkldnn.x86")
 def dense_mkldnn(cfg, data, weight, bias=None, out_dtype=None):
     """Compute dense using mkldnn"""

Review comment:
       ditto

##########
File path: python/tvm/topi/x86/dense.py
##########
@@ -281,72 +281,121 @@ def _callback(op):
     return s
 
 
-def dense_blas_common(cfg, data, weight, bias, out_dtype, lib):
-    """Compute dense using a BLAS library"""
+def matmul_blas_common(cfg, data, weight, bias, out_dtype, data_transposed, weight_transposed, lib):
+    """Compute matmul/dense using a BLAS library"""
     M, K = get_const_tuple(data.shape)
     N, _ = get_const_tuple(weight.shape)
     if isinstance(M, int) and isinstance(K, int) and isinstance(N, int):
         cfg.add_flop(M * K * N * 2)
     if data.dtype == "uint8" and weight.dtype == "int8" and out_dtype == "int32":
         if not hasattr(lib, "matmul_u8s8s32"):
             raise NotImplementedError(
-                f"Dense with {lib.__name__} for {data.dtype} is not supported "
+                f"Matmul/Dense with {lib.__name__} for {data.dtype} is not supported "
                 "(matmulu8s8s32 not imlemented)"
             )
-        C = lib.matmul_u8s8s32(data, weight, False, True, dtype=out_dtype)
+        C = lib.matmul_u8s8s32(data, weight, data_transposed, weight_transposed, dtype=out_dtype)
     elif data.dtype == "float32" or data.dtype == "float64":
-        C = lib.matmul(data, weight, False, True)
+        C = lib.matmul(data, weight, data_transposed, weight_transposed)
     else:
-        raise NotImplementedError(f"Dense with {lib.__name__} for {data.dtype} is not supported")
+        raise NotImplementedError(
+            f"Matmul/Dense with {lib.__name__} for {data.dtype} is not supported"
+        )
 
     if bias is not None:
         C = te.compute(C.shape, lambda i, j: C[i, j] + bias[j].astype(out_dtype), tag=tag.BROADCAST)
     return C
 
 
+def schedule_matmul_blas_common(outs):
+    """Default matmul schedule for BLAS library"""
+    s = te.create_schedule([x.op for x in outs])
+    te.schedule.AutoInlineInjective(s)
+
+    for out in outs:
+        if "dense" not in out.op.tag and "matmul" not in out.op.tag:
+            schedule_injective_from_existing(s, out)
+    return s
+
+
 @autotvm.register_topi_compute("dense_cblas.x86")
 def dense_cblas(cfg, data, weight, bias=None, out_dtype=None):
     """Compute dense using a cblas"""
-    return dense_blas_common(cfg, data, weight, bias, out_dtype, cblas)
+    return matmul_blas_common(cfg, data, weight, bias, out_dtype, False, True, cblas)
 
 
 @autotvm.register_topi_schedule("dense_cblas.x86")
 def schedule_dense_cblas(_, outs):
     """Create schedule for dense_cblas"""
-    return generic.schedule_extern(outs)
+    return schedule_matmul_blas_common(outs)
 
 
 @autotvm.register_topi_compute("dense_mkl.x86")
 def dense_mkl(cfg, data, weight, bias=None, out_dtype=None):
     """Compute dense using mkl"""
-    return dense_blas_common(cfg, data, weight, bias, out_dtype, mkl)
+    return matmul_blas_common(cfg, data, weight, bias, out_dtype, False, True, mkl)
 
 
 @autotvm.register_topi_schedule("dense_mkl.x86")
 def schedule_dense_mkl(_, outs):
     """Create schedule for dense_mkl"""
-    # return generic.schedule_extern(outs)
-    s = te.create_schedule([x.op for x in outs])
-    te.schedule.AutoInlineInjective(s)
-
-    def _callback(op):
-        if "broadcast" in op.tag or "injective" in op.tag or "elemwise" in op.tag:
-            schedule_injective_from_existing(s, op.output(0))
-
-    # traverse_inline(s, outs[0].op, _callback)
-    for out in outs:
-        if "dense" not in out.op.name:
-            schedule_injective_from_existing(s, out)
-    return s
+    return schedule_matmul_blas_common(outs)
 
 
 @autotvm.register_topi_compute("dense_mkldnn.x86")
 def dense_mkldnn(cfg, data, weight, bias=None, out_dtype=None):
     """Compute dense using mkldnn"""
-    return dense_blas_common(cfg, data, weight, bias, out_dtype, mkldnn)
+    return matmul_blas_common(cfg, data, weight, bias, out_dtype, False, True, mkldnn)
 
 
 @autotvm.register_topi_schedule("dense_mkldnn.x86")
 def schedule_dense_mkldnn(_, outs):
     """Create schedule for dense_mkldnn"""

Review comment:
       ditto

##########
File path: python/tvm/topi/nn/dense.py
##########
@@ -51,37 +65,120 @@ def dense(data, weight, bias=None, out_dtype=None, auto_scheduler_rewritten_layo
         assert len(bias.shape) == 1
     if out_dtype is None:
         out_dtype = data.dtype
-    batch, in_dim = data.shape
+    if data_transposed:
+        in_dim, batch = data.shape
+    else:
+        batch, in_dim = data.shape
 
     if auto_scheduler_rewritten_layout:
         # Infer shape for the rewritten layout
         out_dim, red_dim = auto_scheduler.get_shape_from_rewritten_layout(
-            auto_scheduler_rewritten_layout, ["j", "k"]
+            auto_scheduler_rewritten_layout, ["j", "k"] if weight_transposed else ["k", "j"]
         )
         auto_scheduler.remove_index_check(weight)
-    else:
+    elif weight_transposed:
         out_dim, red_dim = weight.shape
+    else:
+        red_dim, out_dim = weight.shape
     assert in_dim == red_dim
 
     k = te.reduce_axis((0, in_dim), name="k")
-    matmul = te.compute(
+    if data_transposed:
+        if weight_transposed:
+            compute_lambda = lambda i, j: te.sum(
+                data[k, i].astype(out_dtype) * weight[j, k].astype(out_dtype), axis=k
+            )
+            compute_name = "T_matmul_TT"
+        else:
+            compute_lambda = lambda i, j: te.sum(
+                data[k, i].astype(out_dtype) * weight[k, j].astype(out_dtype), axis=k
+            )
+            compute_name = "T_matmul_TN"
+        compute_tag = "matmul"
+    else:
+        if weight_transposed:
+            compute_lambda = lambda i, j: te.sum(
+                data[i, k].astype(out_dtype) * weight[j, k].astype(out_dtype), axis=k
+            )
+            compute_name = "T_dense"
+            compute_tag = "dense"
+        else:
+            compute_lambda = lambda i, j: te.sum(
+                data[i, k].astype(out_dtype) * weight[k, j].astype(out_dtype), axis=k
+            )
+            compute_name = "T_matmul"
+            compute_tag = "matmul"
+
+    mat = te.compute(
         (batch, out_dim),
-        lambda i, j: te.sum(data[i, k].astype(out_dtype) * weight[j, k].astype(out_dtype), axis=k),
-        name="T_dense",
-        tag="dense",
+        compute_lambda,
+        name=compute_name,
+        tag=compute_tag,
         attrs={"layout_free_placeholders": [weight]},
     )
+
     if bias is not None:
-        matmul = te.compute(
+        mat = te.compute(
             (batch, out_dim),
-            lambda i, j: matmul[i, j] + bias[j].astype(out_dtype),
+            lambda i, j: mat[i, j] + bias[j].astype(out_dtype),
             tag=tag.BROADCAST,
         )
 
     if auto_scheduler_rewritten_layout:
-        matmul = auto_scheduler.rewrite_compute_body(matmul, auto_scheduler_rewritten_layout)
+        mat = auto_scheduler.rewrite_compute_body(mat, auto_scheduler_rewritten_layout)
+
+    return mat
+
 
-    return matmul
+@tvm.target.generic_func
+def matmul_legalize(attrs, inputs, types):
+    """Legalizes matmul op.
+
+    Parameters
+    ----------
+    attrs : tvm.ir.Attrs
+        Attributes of current matmul
+    inputs : list of tvm.relay.Expr
+        The args of the Relay expr to be legalized
+    types : list of types
+        List of input and output types
+
+    Returns
+    -------
+    result : tvm.relay.Expr
+        The legalized expr
+    """
+    # not to change by default
+    # pylint: disable=unused-argument
+    return None
+
+
+def dense(data, weight, bias=None, out_dtype=None, auto_scheduler_rewritten_layout=""):
+    """The default implementation of dense in topi.

Review comment:
       ```suggestion
       """The default implementation of dense in topi. It is identical to matmul_nt.
   ```

##########
File path: python/tvm/topi/cuda/dense.py
##########
@@ -77,6 +111,36 @@ def _callback(op):
     return s
 
 
+@autotvm.register_topi_compute("matmul_default.cuda")
+def matmul_default_cuda(
+    cfg,
+    data,
+    weight,
+    bias=None,
+    out_dtype=None,
+    data_transposed=False,
+    weight_transposed=False,
+):
+    """Matmul operator on cuda"""

Review comment:
       ditto

##########
File path: python/tvm/relay/op/strategy/cuda.py
##########
@@ -698,6 +698,36 @@ def conv1d_transpose_strategy_cuda(attrs, inputs, out_type, target):
     return strategy
 
 
+@matmul_strategy.register(["cuda", "gpu"])
+def matmul_strategy_cuda(attrs, inputs, out_type, target):
+    """Matmul cuda strategy"""
+    strategy = _op.OpStrategy()
+
+    if is_auto_scheduler_enabled():
+        strategy.add_implementation(
+            wrap_compute_matmul(topi.nn.matmul),
+            naive_schedule,
+            name="matmul.cuda",
+        )
+    else:
+        logger.warning("Matmul other than NT format is not optimized for cuda.")

Review comment:
       - People may not know what "NT" is, as we didn't use this term in matmul op.
   - We may recommend using cublas in this case.
   

##########
File path: python/tvm/relay/op/strategy/x86.py
##########
@@ -370,6 +370,55 @@ def conv1d_strategy_cpu(attrs, inputs, out_type, target):
     return strategy
 
 
+@matmul_strategy.register("cpu")
+def matmul_strategy_cpu(attrs, inputs, out_type, target):
+    """matmul x86 strategy"""
+    strategy = _op.OpStrategy()
+    if is_auto_scheduler_enabled():
+        strategy.add_implementation(
+            wrap_compute_matmul(topi.nn.matmul, need_auto_scheduler_layout=True),
+            naive_schedule,
+            name="matmul.generic",
+            plevel=11,
+        )
+    else:
+        logger.warning("Matmul other than NT format is not optimized for x86.")

Review comment:
       ditto.

##########
File path: python/tvm/relay/frontend/tensorflow.py
##########
@@ -1230,6 +1239,9 @@ def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None):
     params : dict of str to tvm.nd.NDArray
         Dict of converted parameters stored in tvm.nd.NDArray format
     """
+    global _USE_DENSE_INSTEAD_OF_MATMUL

Review comment:
       I feel this is confusing too. If `_USE_DENSE_INSTEAD_OF_MATMUL` is not supposed to be changed by users directly, we should improve the comments of this global variable. Please see my comment at the global variable.
   
   btw, in this case we can simply `_USE_DENSE_INSTEAD_OF_MATMUL = use_dense_op` without checking if they are the same or not.

##########
File path: python/tvm/topi/x86/dense.py
##########
@@ -281,72 +281,121 @@ def _callback(op):
     return s
 
 
-def dense_blas_common(cfg, data, weight, bias, out_dtype, lib):
-    """Compute dense using a BLAS library"""
+def matmul_blas_common(cfg, data, weight, bias, out_dtype, data_transposed, weight_transposed, lib):
+    """Compute matmul/dense using a BLAS library"""
     M, K = get_const_tuple(data.shape)
     N, _ = get_const_tuple(weight.shape)
     if isinstance(M, int) and isinstance(K, int) and isinstance(N, int):
         cfg.add_flop(M * K * N * 2)
     if data.dtype == "uint8" and weight.dtype == "int8" and out_dtype == "int32":
         if not hasattr(lib, "matmul_u8s8s32"):
             raise NotImplementedError(
-                f"Dense with {lib.__name__} for {data.dtype} is not supported "
+                f"Matmul/Dense with {lib.__name__} for {data.dtype} is not supported "
                 "(matmulu8s8s32 not imlemented)"
             )
-        C = lib.matmul_u8s8s32(data, weight, False, True, dtype=out_dtype)
+        C = lib.matmul_u8s8s32(data, weight, data_transposed, weight_transposed, dtype=out_dtype)
     elif data.dtype == "float32" or data.dtype == "float64":
-        C = lib.matmul(data, weight, False, True)
+        C = lib.matmul(data, weight, data_transposed, weight_transposed)
     else:
-        raise NotImplementedError(f"Dense with {lib.__name__} for {data.dtype} is not supported")
+        raise NotImplementedError(
+            f"Matmul/Dense with {lib.__name__} for {data.dtype} is not supported"
+        )
 
     if bias is not None:
         C = te.compute(C.shape, lambda i, j: C[i, j] + bias[j].astype(out_dtype), tag=tag.BROADCAST)
     return C
 
 
+def schedule_matmul_blas_common(outs):
+    """Default matmul schedule for BLAS library"""
+    s = te.create_schedule([x.op for x in outs])
+    te.schedule.AutoInlineInjective(s)
+
+    for out in outs:
+        if "dense" not in out.op.tag and "matmul" not in out.op.tag:
+            schedule_injective_from_existing(s, out)
+    return s
+
+
 @autotvm.register_topi_compute("dense_cblas.x86")
 def dense_cblas(cfg, data, weight, bias=None, out_dtype=None):
     """Compute dense using a cblas"""
-    return dense_blas_common(cfg, data, weight, bias, out_dtype, cblas)
+    return matmul_blas_common(cfg, data, weight, bias, out_dtype, False, True, cblas)
 
 
 @autotvm.register_topi_schedule("dense_cblas.x86")
 def schedule_dense_cblas(_, outs):
     """Create schedule for dense_cblas"""
-    return generic.schedule_extern(outs)
+    return schedule_matmul_blas_common(outs)
 
 
 @autotvm.register_topi_compute("dense_mkl.x86")
 def dense_mkl(cfg, data, weight, bias=None, out_dtype=None):
     """Compute dense using mkl"""

Review comment:
       ditto

##########
File path: python/tvm/topi/cuda/dense.py
##########
@@ -51,6 +57,34 @@ def dense_cublas(cfg, data, weight, bias=None, out_dtype=None):
     return matmul
 
 
+@autotvm.register_topi_compute("matmul_cublas.cuda")
+def matmul_cublas(
+    cfg,
+    data,
+    weight,
+    bias=None,
+    out_dtype=None,
+    data_transposed=False,
+    weight_transposed=False,
+):
+    """Matmul operator on CUDA with CUBLAS"""
+    return _matmul_cublas_common(
+        cfg, data, weight, bias, out_dtype, data_transposed, weight_transposed
+    )
+
+
+@autotvm.register_topi_schedule("matmul_cublas.cuda")
+def schedule_matmul_cublas(_, outs):
+    """Schedule matmul operator using CUBLAS"""
+    return generic.schedule_extern(outs)
+
+
+@autotvm.register_topi_compute("dense_cublas.cuda")
+def dense_cublas(cfg, data, weight, bias=None, out_dtype=None):
+    """Dense operator on CUDA with CUBLAS"""

Review comment:
       ```suggestion
       """Dense operator on CUDA with CUBLAS. This is an alias of matmul_nt operator"""
   ```
   Please add a note to all other similar places saying the dense implementation is indeitcal to matmul_nt.

##########
File path: python/tvm/topi/x86/dense.py
##########
@@ -281,72 +281,121 @@ def _callback(op):
     return s
 
 
-def dense_blas_common(cfg, data, weight, bias, out_dtype, lib):
-    """Compute dense using a BLAS library"""
+def matmul_blas_common(cfg, data, weight, bias, out_dtype, data_transposed, weight_transposed, lib):
+    """Compute matmul/dense using a BLAS library"""
     M, K = get_const_tuple(data.shape)
     N, _ = get_const_tuple(weight.shape)
     if isinstance(M, int) and isinstance(K, int) and isinstance(N, int):
         cfg.add_flop(M * K * N * 2)
     if data.dtype == "uint8" and weight.dtype == "int8" and out_dtype == "int32":
         if not hasattr(lib, "matmul_u8s8s32"):
             raise NotImplementedError(
-                f"Dense with {lib.__name__} for {data.dtype} is not supported "
+                f"Matmul/Dense with {lib.__name__} for {data.dtype} is not supported "
                 "(matmulu8s8s32 not imlemented)"
             )
-        C = lib.matmul_u8s8s32(data, weight, False, True, dtype=out_dtype)
+        C = lib.matmul_u8s8s32(data, weight, data_transposed, weight_transposed, dtype=out_dtype)
     elif data.dtype == "float32" or data.dtype == "float64":
-        C = lib.matmul(data, weight, False, True)
+        C = lib.matmul(data, weight, data_transposed, weight_transposed)
     else:
-        raise NotImplementedError(f"Dense with {lib.__name__} for {data.dtype} is not supported")
+        raise NotImplementedError(
+            f"Matmul/Dense with {lib.__name__} for {data.dtype} is not supported"
+        )
 
     if bias is not None:
         C = te.compute(C.shape, lambda i, j: C[i, j] + bias[j].astype(out_dtype), tag=tag.BROADCAST)
     return C
 
 
+def schedule_matmul_blas_common(outs):
+    """Default matmul schedule for BLAS library"""
+    s = te.create_schedule([x.op for x in outs])
+    te.schedule.AutoInlineInjective(s)
+
+    for out in outs:
+        if "dense" not in out.op.tag and "matmul" not in out.op.tag:
+            schedule_injective_from_existing(s, out)
+    return s
+
+
 @autotvm.register_topi_compute("dense_cblas.x86")
 def dense_cblas(cfg, data, weight, bias=None, out_dtype=None):
     """Compute dense using a cblas"""
-    return dense_blas_common(cfg, data, weight, bias, out_dtype, cblas)
+    return matmul_blas_common(cfg, data, weight, bias, out_dtype, False, True, cblas)
 
 
 @autotvm.register_topi_schedule("dense_cblas.x86")
 def schedule_dense_cblas(_, outs):
     """Create schedule for dense_cblas"""

Review comment:
       ditto

##########
File path: python/tvm/relay/op/nn/nn.py
##########
@@ -1471,6 +1471,47 @@ def bias_add(data, bias, axis=1):
     return _make.bias_add(data, bias, axis)
 
 
+def matmul(data, weight, units=None, out_dtype="", data_transposed=False, weight_transposed=False):
+    """Matmul operator.
+    Applies a linear transformation. The X & W can be transposed.
+
+    .. math::
+
+        `Y = X * W`
+
+    Parameters
+    ----------
+    data : tvm.relay.Expr
+        The input data to the operator,
+        of shape `(d_1, d_2, ..., d_n, units_in)` or `(d_1, d_2, ..., units_in, d_n)`.
+
+    weight : tvm.relay.Expr
+        The weight expressions, 2-D matrix,
+        of shape `(units_in, units)` or `(units, units_in)`.
+
+    units : int, optional
+        Number of hidden units of the matmul transformation.
+
+    out_dtype : str, optional
+        Specifies the output data type for mixed precision dense,
+        of shape `(d_1, d_2, ..., d_n, units)`.
+
+    data_transposed : bool, optional
+        Whether the data tensor is in transposed format.
+
+    weight_transposed : bool, optional
+        Whether the weight tensor is in transposed format.
+
+    Returns
+    -------
+    result : tvm.relay.Expr
+        The computed result.
+    """
+    if not data_transposed and weight_transposed:
+        return dense(data, weight, units, out_dtype)

Review comment:
       Add a TODO saying that this will be removed once dense has been simplified?

##########
File path: python/tvm/relay/op/strategy/x86.py
##########
@@ -370,6 +370,55 @@ def conv1d_strategy_cpu(attrs, inputs, out_type, target):
     return strategy
 
 
+@matmul_strategy.register("cpu")
+def matmul_strategy_cpu(attrs, inputs, out_type, target):
+    """matmul x86 strategy"""
+    strategy = _op.OpStrategy()
+    if is_auto_scheduler_enabled():
+        strategy.add_implementation(
+            wrap_compute_matmul(topi.nn.matmul, need_auto_scheduler_layout=True),
+            naive_schedule,
+            name="matmul.generic",
+            plevel=11,
+        )
+    else:
+        logger.warning("Matmul other than NT format is not optimized for x86.")
+        strategy.add_implementation(
+            wrap_compute_matmul(topi.nn.matmul),
+            naive_schedule,
+            name="matmul.generic",
+        )
+
+    same_type = inputs[0].dtype == inputs[1].dtype == out_type.dtype
+    dtype = inputs[0].dtype
+    u8s8s32 = dtype == "uint8" and inputs[1].dtype == "int8" and out_type.dtype == "int32"
+    if "cblas" in target.libs:

Review comment:
       IMHO, if users specified kernel library in their target, but we cannot use the library in this matmul, it might be better to throw a warning.

##########
File path: python/tvm/topi/nn/dense.py
##########
@@ -51,37 +65,120 @@ def dense(data, weight, bias=None, out_dtype=None, auto_scheduler_rewritten_layo
         assert len(bias.shape) == 1
     if out_dtype is None:
         out_dtype = data.dtype
-    batch, in_dim = data.shape
+    if data_transposed:
+        in_dim, batch = data.shape
+    else:
+        batch, in_dim = data.shape
 
     if auto_scheduler_rewritten_layout:
         # Infer shape for the rewritten layout
         out_dim, red_dim = auto_scheduler.get_shape_from_rewritten_layout(
-            auto_scheduler_rewritten_layout, ["j", "k"]
+            auto_scheduler_rewritten_layout, ["j", "k"] if weight_transposed else ["k", "j"]
         )
         auto_scheduler.remove_index_check(weight)
-    else:
+    elif weight_transposed:
         out_dim, red_dim = weight.shape
+    else:
+        red_dim, out_dim = weight.shape
     assert in_dim == red_dim
 
     k = te.reduce_axis((0, in_dim), name="k")
-    matmul = te.compute(
+    if data_transposed:
+        if weight_transposed:
+            compute_lambda = lambda i, j: te.sum(
+                data[k, i].astype(out_dtype) * weight[j, k].astype(out_dtype), axis=k
+            )
+            compute_name = "T_matmul_TT"
+        else:
+            compute_lambda = lambda i, j: te.sum(
+                data[k, i].astype(out_dtype) * weight[k, j].astype(out_dtype), axis=k
+            )
+            compute_name = "T_matmul_TN"
+        compute_tag = "matmul"
+    else:
+        if weight_transposed:
+            compute_lambda = lambda i, j: te.sum(
+                data[i, k].astype(out_dtype) * weight[j, k].astype(out_dtype), axis=k
+            )
+            compute_name = "T_dense"

Review comment:
       @jcf94 please discuss as I have the same issue above.

##########
File path: python/tvm/topi/x86/dense.py
##########
@@ -281,72 +281,121 @@ def _callback(op):
     return s
 
 
-def dense_blas_common(cfg, data, weight, bias, out_dtype, lib):
-    """Compute dense using a BLAS library"""
+def matmul_blas_common(cfg, data, weight, bias, out_dtype, data_transposed, weight_transposed, lib):
+    """Compute matmul/dense using a BLAS library"""
     M, K = get_const_tuple(data.shape)
     N, _ = get_const_tuple(weight.shape)
     if isinstance(M, int) and isinstance(K, int) and isinstance(N, int):
         cfg.add_flop(M * K * N * 2)
     if data.dtype == "uint8" and weight.dtype == "int8" and out_dtype == "int32":
         if not hasattr(lib, "matmul_u8s8s32"):
             raise NotImplementedError(
-                f"Dense with {lib.__name__} for {data.dtype} is not supported "
+                f"Matmul/Dense with {lib.__name__} for {data.dtype} is not supported "
                 "(matmulu8s8s32 not imlemented)"
             )
-        C = lib.matmul_u8s8s32(data, weight, False, True, dtype=out_dtype)
+        C = lib.matmul_u8s8s32(data, weight, data_transposed, weight_transposed, dtype=out_dtype)
     elif data.dtype == "float32" or data.dtype == "float64":
-        C = lib.matmul(data, weight, False, True)
+        C = lib.matmul(data, weight, data_transposed, weight_transposed)
     else:
-        raise NotImplementedError(f"Dense with {lib.__name__} for {data.dtype} is not supported")
+        raise NotImplementedError(
+            f"Matmul/Dense with {lib.__name__} for {data.dtype} is not supported"
+        )
 
     if bias is not None:
         C = te.compute(C.shape, lambda i, j: C[i, j] + bias[j].astype(out_dtype), tag=tag.BROADCAST)
     return C
 
 
+def schedule_matmul_blas_common(outs):
+    """Default matmul schedule for BLAS library"""
+    s = te.create_schedule([x.op for x in outs])
+    te.schedule.AutoInlineInjective(s)
+
+    for out in outs:
+        if "dense" not in out.op.tag and "matmul" not in out.op.tag:
+            schedule_injective_from_existing(s, out)
+    return s
+
+
 @autotvm.register_topi_compute("dense_cblas.x86")
 def dense_cblas(cfg, data, weight, bias=None, out_dtype=None):
     """Compute dense using a cblas"""
-    return dense_blas_common(cfg, data, weight, bias, out_dtype, cblas)
+    return matmul_blas_common(cfg, data, weight, bias, out_dtype, False, True, cblas)
 
 
 @autotvm.register_topi_schedule("dense_cblas.x86")
 def schedule_dense_cblas(_, outs):
     """Create schedule for dense_cblas"""
-    return generic.schedule_extern(outs)
+    return schedule_matmul_blas_common(outs)
 
 
 @autotvm.register_topi_compute("dense_mkl.x86")
 def dense_mkl(cfg, data, weight, bias=None, out_dtype=None):
     """Compute dense using mkl"""
-    return dense_blas_common(cfg, data, weight, bias, out_dtype, mkl)
+    return matmul_blas_common(cfg, data, weight, bias, out_dtype, False, True, mkl)
 
 
 @autotvm.register_topi_schedule("dense_mkl.x86")
 def schedule_dense_mkl(_, outs):
     """Create schedule for dense_mkl"""

Review comment:
       ditto




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] comaniac commented on a change in pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
comaniac commented on a change in pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#discussion_r660255574



##########
File path: python/tvm/topi/nn/dense.py
##########
@@ -51,37 +65,120 @@ def dense(data, weight, bias=None, out_dtype=None, auto_scheduler_rewritten_layo
         assert len(bias.shape) == 1
     if out_dtype is None:
         out_dtype = data.dtype
-    batch, in_dim = data.shape
+    if data_transposed:
+        in_dim, batch = data.shape
+    else:
+        batch, in_dim = data.shape
 
     if auto_scheduler_rewritten_layout:
         # Infer shape for the rewritten layout
         out_dim, red_dim = auto_scheduler.get_shape_from_rewritten_layout(
-            auto_scheduler_rewritten_layout, ["j", "k"]
+            auto_scheduler_rewritten_layout, ["j", "k"] if weight_transposed else ["k", "j"]
         )
         auto_scheduler.remove_index_check(weight)
-    else:
+    elif weight_transposed:
         out_dim, red_dim = weight.shape
+    else:
+        red_dim, out_dim = weight.shape
     assert in_dim == red_dim
 
     k = te.reduce_axis((0, in_dim), name="k")
-    matmul = te.compute(
+    if data_transposed:
+        if weight_transposed:
+            compute_lambda = lambda i, j: te.sum(
+                data[k, i].astype(out_dtype) * weight[j, k].astype(out_dtype), axis=k
+            )
+            compute_name = "T_matmul_TT"
+        else:
+            compute_lambda = lambda i, j: te.sum(
+                data[k, i].astype(out_dtype) * weight[k, j].astype(out_dtype), axis=k
+            )
+            compute_name = "T_matmul_TN"
+        compute_tag = "matmul"
+    else:
+        if weight_transposed:
+            compute_lambda = lambda i, j: te.sum(
+                data[i, k].astype(out_dtype) * weight[j, k].astype(out_dtype), axis=k
+            )
+            compute_name = "T_dense"

Review comment:
       I personally vote for B.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] tkonolige commented on a change in pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
tkonolige commented on a change in pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#discussion_r660790719



##########
File path: python/tvm/relay/op/nn/nn.py
##########
@@ -1471,6 +1471,47 @@ def bias_add(data, bias, axis=1):
     return _make.bias_add(data, bias, axis)
 
 
+def matmul(data, weight, units=None, out_dtype="", data_transposed=False, weight_transposed=False):
+    """Matmul operator.
+    Applies a linear transformation. The X & W can be transposed.
+
+    .. math::
+
+        `Y = X * W`
+
+    Parameters
+    ----------
+    data : tvm.relay.Expr
+        The input data to the operator,
+        of shape `(d_1, d_2, ..., d_n, units_in)` or `(d_1, d_2, ..., units_in, d_n)`.

Review comment:
       Can you update the definition of the computation above to reflect these shapes then?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jcf94 commented on a change in pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
jcf94 commented on a change in pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#discussion_r649638468



##########
File path: python/tvm/relay/op/nn/nn.py
##########
@@ -1471,7 +1471,46 @@ def bias_add(data, bias, axis=1):
     return _make.bias_add(data, bias, axis)
 
 
-def dense(data, weight, units=None, out_dtype=""):
+def matmul(data, weight, units=None, out_dtype="", data_transposed=False, weight_transposed=False):
+    """Dense operator.
+    Applies a linear transformation. The X & W can be transposed.
+
+    .. math::
+
+    `Y = X * W`
+
+    Parameters
+    ----------
+    data : tvm.relay.Expr
+        The input data to the operator,
+        of shape `(d_1, d_2, ..., d_n, units_in)` or `(d_1, d_2, ..., units_in, d_n)`.
+
+    weight : tvm.relay.Expr
+        The weight expressions, 2-D matrix,
+        of shape `(units_in, units)` or `(units, units_in)`.
+
+    units : int, optional
+        Number of hidden units of the dense transformation.
+
+    out_dtype : str, optional
+        Specifies the output data type for mixed precision dense,
+        of shape `(d_1, d_2, ..., d_n, units)`.
+
+    data_transposed : bool, optional
+        Whether the data tensor is in transposed format.
+
+    weight_transposed : bool, optional
+        Whether the weight tensor is in transposed format.
+
+    Returns
+    -------
+    result : tvm.relay.Expr
+        The computed result.
+    """
+    return _make.matmul(data, weight, units, out_dtype, data_transposed, weight_transposed)
+
+
+def dense(data, weight, units=None, out_dtype="", data_transposed=False, weight_transposed=True):

Review comment:
       TVM has some API calling `relay_op(xxx, **attrs)`, python function calls cannot work with the missing of the two parameters.

##########
File path: python/tvm/relay/frontend/tensorflow.py
##########
@@ -1230,6 +1239,9 @@ def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None):
     params : dict of str to tvm.nd.NDArray
         Dict of converted parameters stored in tvm.nd.NDArray format
     """
+    global _USE_DENSE_INSTEAD_OF_MATMUL

Review comment:
       Yeah, I've also tried several ways, but seems there is no better solution from my view. Python module can be seen as a const singleton, this should be safe if the `from_tensorflow` function is the only entry.

##########
File path: python/tvm/relay/op/nn/_nn.py
##########
@@ -52,6 +52,32 @@
 reg.register_pattern("nn.log_softmax", OpPattern.OPAQUE)
 
 
+@reg.register_legalize("nn.matmul")
+def leaglize_matmul(attrs, inputs, types):
+    """Legalize matmul op.

Review comment:
       Actually nothing now, this is just a blank function like the other ops do.

##########
File path: python/tvm/relay/op/strategy/cuda.py
##########
@@ -698,6 +698,26 @@ def conv1d_transpose_strategy_cuda(attrs, inputs, out_type, target):
     return strategy
 
 
+@matmul_strategy.register(["cuda", "gpu"])
+def matmul_strategy_cuda(attrs, inputs, out_type, target):
+    """dense cuda strategy"""
+    strategy = _op.OpStrategy()
+    if target.kind.name == "cuda" and "cublas" in target.libs:
+        strategy.add_implementation(
+            wrap_compute_matmul(topi.cuda.matmul_cublas),
+            wrap_topi_schedule(topi.cuda.schedule_matmul_cublas),
+            name="matmul_cublas.cuda",
+            plevel=25,
+        )
+    if is_auto_scheduler_enabled():
+        strategy.add_implementation(
+            wrap_compute_matmul(topi.nn.matmul),
+            naive_schedule,
+            name="matmul.cuda",
+        )

Review comment:
       In the current implementation, all `matmul` op with `NT` layout is transformed to `dense` at the beginning.

##########
File path: tests/python/relay/test_op_level1.py
##########
@@ -426,7 +486,7 @@ def test_dense():
     for dtype in ["float16", "float32"]:
         # Dense accuracy for float16 is poor
         if dtype == "float16":
-            return
+            continue

Review comment:
       I think this is a bug. This function will always return, and no check will be taken in this UT.

##########
File path: python/tvm/relay/op/nn/nn.py
##########
@@ -1471,7 +1471,46 @@ def bias_add(data, bias, axis=1):
     return _make.bias_add(data, bias, axis)
 
 
-def dense(data, weight, units=None, out_dtype=""):
+def matmul(data, weight, units=None, out_dtype="", data_transposed=False, weight_transposed=False):
+    """Dense operator.
+    Applies a linear transformation. The X & W can be transposed.
+
+    .. math::
+
+    `Y = X * W`
+
+    Parameters
+    ----------
+    data : tvm.relay.Expr
+        The input data to the operator,
+        of shape `(d_1, d_2, ..., d_n, units_in)` or `(d_1, d_2, ..., units_in, d_n)`.
+
+    weight : tvm.relay.Expr
+        The weight expressions, 2-D matrix,
+        of shape `(units_in, units)` or `(units, units_in)`.
+
+    units : int, optional
+        Number of hidden units of the dense transformation.
+
+    out_dtype : str, optional
+        Specifies the output data type for mixed precision dense,
+        of shape `(d_1, d_2, ..., d_n, units)`.
+
+    data_transposed : bool, optional
+        Whether the data tensor is in transposed format.
+
+    weight_transposed : bool, optional
+        Whether the weight tensor is in transposed format.
+
+    Returns
+    -------
+    result : tvm.relay.Expr
+        The computed result.
+    """
+    return _make.matmul(data, weight, units, out_dtype, data_transposed, weight_transposed)
+
+
+def dense(data, weight, units=None, out_dtype="", data_transposed=False, weight_transposed=True):

Review comment:
       TVM has some API calling `relay_op(xxx, **attrs)`, python function calls cannot work with the missing of the two parameters unless we still keep a `DenseAttrs`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] tkonolige commented on a change in pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
tkonolige commented on a change in pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#discussion_r660788822



##########
File path: include/tvm/relay/attrs/nn.h
##########
@@ -961,6 +961,32 @@ struct AvgPool3DAttrs : public tvm::AttrsNode<AvgPool3DAttrs> {
   }
 };
 
+/*! \brief Attributes for matmul operator */
+struct MatmulAttrs : public tvm::AttrsNode<MatmulAttrs> {
+  IndexExpr units;
+  DataType out_dtype;
+  bool data_transposed;
+  bool weight_transposed;
+  tvm::String auto_scheduler_rewritten_layout;  // The layout after auto-scheduler's layout rewrite

Review comment:
       No, I'm wondering about `auto_scheduler_rewritten_layout`. Why is that part of the op definition?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jcf94 commented on a change in pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
jcf94 commented on a change in pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#discussion_r660253600



##########
File path: python/tvm/relay/op/strategy/x86.py
##########
@@ -370,6 +370,55 @@ def conv1d_strategy_cpu(attrs, inputs, out_type, target):
     return strategy
 
 
+@matmul_strategy.register("cpu")
+def matmul_strategy_cpu(attrs, inputs, out_type, target):
+    """matmul x86 strategy"""
+    strategy = _op.OpStrategy()
+    if is_auto_scheduler_enabled():
+        strategy.add_implementation(
+            wrap_compute_matmul(topi.nn.matmul, need_auto_scheduler_layout=True),
+            naive_schedule,
+            name="matmul.generic",
+            plevel=11,
+        )
+    else:
+        logger.warning("Matmul other than NT format is not optimized for x86.")
+        strategy.add_implementation(
+            wrap_compute_matmul(topi.nn.matmul),
+            naive_schedule,
+            name="matmul.generic",
+        )
+
+    same_type = inputs[0].dtype == inputs[1].dtype == out_type.dtype
+    dtype = inputs[0].dtype
+    u8s8s32 = dtype == "uint8" and inputs[1].dtype == "int8" and out_type.dtype == "int32"
+    if "cblas" in target.libs:

Review comment:
       I agree, but seems there is not an api for `SpecializedCondition` to process the False path?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] altanh commented on a change in pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
altanh commented on a change in pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#discussion_r649363398



##########
File path: python/tvm/relay/frontend/tensorflow.py
##########
@@ -1230,6 +1239,9 @@ def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None):
     params : dict of str to tvm.nd.NDArray
         Dict of converted parameters stored in tvm.nd.NDArray format
     """
+    global _USE_DENSE_INSTEAD_OF_MATMUL

Review comment:
       is it possible to avoid using this global variable? I'm not familiar with the importer but would be nice if we could use an importer config dict or something

##########
File path: include/tvm/relay/attrs/nn.h
##########
@@ -961,19 +961,29 @@ struct AvgPool3DAttrs : public tvm::AttrsNode<AvgPool3DAttrs> {
   }
 };
 
-/*! \brief Attributes for dense operator */
-struct DenseAttrs : public tvm::AttrsNode<DenseAttrs> {
+/*! \brief Attributes for matmul operator and dense operator */
+struct MatmulAttrs : public tvm::AttrsNode<MatmulAttrs> {
   IndexExpr units;
-  tvm::String auto_scheduler_rewritten_layout;  // The layout after auto-scheduler's layout rewrite
   DataType out_dtype;
+  bool data_transposed;

Review comment:
       nit: wonder if we should use `transpose_data` and `transpose_weight` to closer match the existing frameworks (e.g. BLAS libs use `transa`, Tensorflow with `transpose_a`)

##########
File path: python/tvm/relay/op/nn/_nn.py
##########
@@ -52,6 +52,32 @@
 reg.register_pattern("nn.log_softmax", OpPattern.OPAQUE)
 
 
+@reg.register_legalize("nn.matmul")
+def leaglize_matmul(attrs, inputs, types):
+    """Legalize matmul op.

Review comment:
       could you summarize what the legalization does for this op?

##########
File path: python/tvm/relay/op/nn/nn.py
##########
@@ -1496,11 +1535,21 @@ def dense(data, weight, units=None, out_dtype=""):
         Specifies the output data type for mixed precision dense,
         of shape `(d_1, d_2, ..., d_n, units)`.
 
+    data_transposed : bool, optional
+        Whether the data tensor is in transposed format. Expected to be False.
+
+    weight_transposed : bool, optional
+        Whether the weight tensor is in transposed format. Expected to be True.
+
     Returns
     -------
     result : tvm.relay.Expr
         The computed result.
     """
+    # Add data_transposed & weight_transposed parameters for some API requires to apply

Review comment:
       could you explain why?

##########
File path: tests/python/relay/test_op_level1.py
##########
@@ -426,7 +486,7 @@ def test_dense():
     for dtype in ["float16", "float32"]:
         # Dense accuracy for float16 is poor
         if dtype == "float16":
-            return
+            continue

Review comment:
       did some PR affect this?

##########
File path: src/relay/op/nn/nn.h
##########
@@ -83,11 +90,12 @@ bool DenseRel(const Array<Type>& types, int num_inputs, const Attrs& attrs,
     } else {
       ICHECK(static_cast<int>(weight->shape.size()) == 2);
       if (!data->shape.back().as<tir::AnyNode>()) {
-        ICHECK(reporter->AssertEQ(data->shape[data->shape.size() - 1], weight->shape[1]))
-            << "DenseRel: input dimension doesn't match,"
+        ICHECK((param->weight_transposed && reporter->AssertEQ(reduce, weight->shape[1])) ||

Review comment:
       could you try using diagnostics for this?

##########
File path: python/tvm/topi/nn/dense.py
##########
@@ -51,37 +65,120 @@ def dense(data, weight, bias=None, out_dtype=None, auto_scheduler_rewritten_layo
         assert len(bias.shape) == 1
     if out_dtype is None:
         out_dtype = data.dtype
-    batch, in_dim = data.shape
+    if data_transposed:
+        in_dim, batch = data.shape
+    else:
+        batch, in_dim = data.shape
 
     if auto_scheduler_rewritten_layout:
         # Infer shape for the rewritten layout
         out_dim, red_dim = auto_scheduler.get_shape_from_rewritten_layout(
-            auto_scheduler_rewritten_layout, ["j", "k"]
+            auto_scheduler_rewritten_layout, ["j", "k"] if weight_transposed else ["k", "j"]
         )
         auto_scheduler.remove_index_check(weight)
-    else:
+    elif weight_transposed:
         out_dim, red_dim = weight.shape
+    else:
+        red_dim, out_dim = weight.shape
     assert in_dim == red_dim
 
     k = te.reduce_axis((0, in_dim), name="k")
-    matmul = te.compute(
+    if data_transposed:
+        if weight_transposed:
+            compute_lambda = lambda i, j: te.sum(
+                data[k, i].astype(out_dtype) * weight[j, k].astype(out_dtype), axis=k
+            )
+            compute_name = "T_matmul_TT"
+        else:
+            compute_lambda = lambda i, j: te.sum(
+                data[k, i].astype(out_dtype) * weight[k, j].astype(out_dtype), axis=k
+            )
+            compute_name = "T_matmul_TN"
+        compute_tag = "matmul"
+    else:
+        if weight_transposed:
+            compute_lambda = lambda i, j: te.sum(
+                data[i, k].astype(out_dtype) * weight[j, k].astype(out_dtype), axis=k
+            )
+            compute_name = "T_dense"
+            compute_tag = "dense"
+        else:
+            compute_lambda = lambda i, j: te.sum(
+                data[i, k].astype(out_dtype) * weight[k, j].astype(out_dtype), axis=k
+            )
+            compute_name = "T_matmul"

Review comment:
       `T_matmul_NN` for consistency

##########
File path: python/tvm/topi/nn/dense.py
##########
@@ -51,37 +65,120 @@ def dense(data, weight, bias=None, out_dtype=None, auto_scheduler_rewritten_layo
         assert len(bias.shape) == 1
     if out_dtype is None:
         out_dtype = data.dtype
-    batch, in_dim = data.shape
+    if data_transposed:
+        in_dim, batch = data.shape
+    else:
+        batch, in_dim = data.shape
 
     if auto_scheduler_rewritten_layout:
         # Infer shape for the rewritten layout
         out_dim, red_dim = auto_scheduler.get_shape_from_rewritten_layout(
-            auto_scheduler_rewritten_layout, ["j", "k"]
+            auto_scheduler_rewritten_layout, ["j", "k"] if weight_transposed else ["k", "j"]
         )
         auto_scheduler.remove_index_check(weight)
-    else:
+    elif weight_transposed:
         out_dim, red_dim = weight.shape
+    else:
+        red_dim, out_dim = weight.shape
     assert in_dim == red_dim
 
     k = te.reduce_axis((0, in_dim), name="k")
-    matmul = te.compute(
+    if data_transposed:
+        if weight_transposed:
+            compute_lambda = lambda i, j: te.sum(
+                data[k, i].astype(out_dtype) * weight[j, k].astype(out_dtype), axis=k
+            )
+            compute_name = "T_matmul_TT"
+        else:
+            compute_lambda = lambda i, j: te.sum(
+                data[k, i].astype(out_dtype) * weight[k, j].astype(out_dtype), axis=k
+            )
+            compute_name = "T_matmul_TN"
+        compute_tag = "matmul"
+    else:
+        if weight_transposed:
+            compute_lambda = lambda i, j: te.sum(
+                data[i, k].astype(out_dtype) * weight[j, k].astype(out_dtype), axis=k
+            )
+            compute_name = "T_dense"

Review comment:
       do we need to keep this as `dense` or can we unify it to be `T_matmul_NT`?

##########
File path: python/tvm/relay/op/nn/_nn.py
##########
@@ -52,6 +52,32 @@
 reg.register_pattern("nn.log_softmax", OpPattern.OPAQUE)
 
 
+@reg.register_legalize("nn.matmul")
+def leaglize_matmul(attrs, inputs, types):
+    """Legalize matmul op.
+
+    Parameters
+    ----------
+    attrs : tvm.ir.Attrs
+        Attributes of current convolution

Review comment:
       ```suggestion
           Attributes of current matmul
   ```

##########
File path: python/tvm/relay/op/nn/_nn.py
##########
@@ -52,6 +52,32 @@
 reg.register_pattern("nn.log_softmax", OpPattern.OPAQUE)
 
 
+@reg.register_legalize("nn.matmul")
+def leaglize_matmul(attrs, inputs, types):

Review comment:
       typo, and can we standardize it with `topi.nn.matmul_legalize`? (choose one of `legalize_matmul` or `matmul_legalize`, I prefer the first)

##########
File path: python/tvm/relay/op/strategy/cuda.py
##########
@@ -698,6 +698,26 @@ def conv1d_transpose_strategy_cuda(attrs, inputs, out_type, target):
     return strategy
 
 
+@matmul_strategy.register(["cuda", "gpu"])
+def matmul_strategy_cuda(attrs, inputs, out_type, target):
+    """dense cuda strategy"""
+    strategy = _op.OpStrategy()
+    if target.kind.name == "cuda" and "cublas" in target.libs:
+        strategy.add_implementation(
+            wrap_compute_matmul(topi.cuda.matmul_cublas),
+            wrap_topi_schedule(topi.cuda.schedule_matmul_cublas),
+            name="matmul_cublas.cuda",
+            plevel=25,
+        )
+    if is_auto_scheduler_enabled():
+        strategy.add_implementation(
+            wrap_compute_matmul(topi.nn.matmul),
+            naive_schedule,
+            name="matmul.cuda",
+        )

Review comment:
       is it possible to fallback to dense schedules when the layout is `NT` or are we going to try and unify it all to matmul in a later PR?

##########
File path: python/tvm/relay/op/strategy/cuda.py
##########
@@ -698,6 +698,26 @@ def conv1d_transpose_strategy_cuda(attrs, inputs, out_type, target):
     return strategy
 
 
+@matmul_strategy.register(["cuda", "gpu"])
+def matmul_strategy_cuda(attrs, inputs, out_type, target):
+    """dense cuda strategy"""

Review comment:
       ```suggestion
       """matmul cuda strategy"""
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] tkonolige commented on a change in pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
tkonolige commented on a change in pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#discussion_r660790415



##########
File path: python/tvm/relay/op/nn/nn.py
##########
@@ -1471,6 +1471,47 @@ def bias_add(data, bias, axis=1):
     return _make.bias_add(data, bias, axis)
 
 
+def matmul(data, weight, units=None, out_dtype="", data_transposed=False, weight_transposed=False):
+    """Matmul operator.
+    Applies a linear transformation. The X & W can be transposed.
+
+    .. math::
+
+        `Y = X * W`
+
+    Parameters
+    ----------
+    data : tvm.relay.Expr
+        The input data to the operator,
+        of shape `(d_1, d_2, ..., d_n, units_in)` or `(d_1, d_2, ..., units_in, d_n)`.
+
+    weight : tvm.relay.Expr
+        The weight expressions, 2-D matrix,
+        of shape `(units_in, units)` or `(units, units_in)`.
+
+    units : int, optional
+        Number of hidden units of the matmul transformation.

Review comment:
       I don't think this is clear at all. Is the hidden units the inner dimension of the matmul?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jcf94 commented on a change in pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
jcf94 commented on a change in pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#discussion_r660228979



##########
File path: python/tvm/relay/frontend/tensorflow.py
##########
@@ -1204,7 +1208,7 @@ def from_tensorflow(self, graph, layout="NHWC", shape=None, outputs=None):
         return func, self._params
 
 
-def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None):
+def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None, use_dense_op=True):

Review comment:
       The problem is that we're not able to remove all the `nn.dense` at this moment and there's not enough AutoTVM template for `nn.matmul`.
   
   So the use of `nn.matmul` can only be seen as a experimental feature. We should not change the default behavior in case this may affect those who are using `nn.dense`.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jcf94 commented on a change in pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
jcf94 commented on a change in pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#discussion_r660226315



##########
File path: python/tvm/relay/op/nn/_nn.py
##########
@@ -1160,21 +1186,46 @@ def batch_flatten_shape_func(attrs, inputs, _):
 
 
 @script
-def _dense_shape_func(data_shape, weight_shape):
+def _matmul_shape_func(data_shape, weight_shape, data_transposed, weight_transposed):
     out = output_tensor((data_shape.shape[0],), "int64")
     for i in const_range(out.shape[0] - 1):
         out[i] = data_shape[i]
-    out[out.shape[0] - 1] = weight_shape[0]
+    if data_transposed:
+        out[out.shape[0] - 2] = out[out.shape[0] - 1]
+    out[out.shape[0] - 1] = weight_shape[0] if weight_transposed else weight_shape[1]

Review comment:
       Since the dimension of data tensor can be more than 2, this is the simplest implementation to do so.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] tkonolige edited a comment on pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
tkonolige edited a comment on pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#issuecomment-858753090


   To clarify: you are replacing topi dense implementations with the new matmul one. But you are leaving `nn.dense` as relay op and adding another relay op called `nn.matmul`. Why not remove the `nn.dense` relay op?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jcf94 commented on pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
jcf94 commented on pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#issuecomment-870515314


   Thanks! @comaniac @tkonolige  Most comments have been addressed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jcf94 commented on a change in pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
jcf94 commented on a change in pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#discussion_r660250782



##########
File path: python/tvm/relay/op/nn/_nn.py
##########
@@ -1160,21 +1186,46 @@ def batch_flatten_shape_func(attrs, inputs, _):
 
 
 @script
-def _dense_shape_func(data_shape, weight_shape):
+def _matmul_shape_func(data_shape, weight_shape, data_transposed, weight_transposed):

Review comment:
       Updated all `data_transposed` & `weight_transposed` to `transpose_a` & `transpose_b`. And also renamed all the `data` & `weight` in matmul to `tensor_a` & `tensor_b`. Tensor names in dense remain unchanged.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] tkonolige commented on a change in pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
tkonolige commented on a change in pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#discussion_r660789978



##########
File path: python/tvm/relay/frontend/tensorflow.py
##########
@@ -1204,7 +1208,7 @@ def from_tensorflow(self, graph, layout="NHWC", shape=None, outputs=None):
         return func, self._params
 
 
-def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None):
+def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None, use_dense_op=True):

Review comment:
       Can't we use the dense schedules when A_transpose=false and B_transpose=true. Then we can convert all nn.dense to nn.matmul.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jcf94 commented on pull request #8234: [WIP][Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
jcf94 commented on pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#issuecomment-868938741


   Hi @junrushao1994 @tkonolige @comaniac @altanh I've tried several ways to remove the `nn.dense`, but it results on the modifications among nearly 70 files and some strange CI errors.
   
   Currently I think its still better to keep the `nn.dense` and `nn.matmul`. We can gradually remove the use of `nn.dense` in the future and finnaly deprecate it.
   
   Also cc @tqchen @FrozenGene 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] comaniac commented on a change in pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
comaniac commented on a change in pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#discussion_r660801654



##########
File path: python/tvm/relay/op/strategy/x86.py
##########
@@ -370,6 +370,55 @@ def conv1d_strategy_cpu(attrs, inputs, out_type, target):
     return strategy
 
 
+@matmul_strategy.register("cpu")
+def matmul_strategy_cpu(attrs, inputs, out_type, target):
+    """matmul x86 strategy"""
+    strategy = _op.OpStrategy()
+    if is_auto_scheduler_enabled():
+        strategy.add_implementation(
+            wrap_compute_matmul(topi.nn.matmul, need_auto_scheduler_layout=True),
+            naive_schedule,
+            name="matmul.generic",
+            plevel=11,
+        )
+    else:
+        logger.warning("Matmul other than NT format is not optimized for x86.")
+        strategy.add_implementation(
+            wrap_compute_matmul(topi.nn.matmul),
+            naive_schedule,
+            name="matmul.generic",
+        )
+
+    same_type = inputs[0].dtype == inputs[1].dtype == out_type.dtype
+    dtype = inputs[0].dtype
+    u8s8s32 = dtype == "uint8" and inputs[1].dtype == "int8" and out_type.dtype == "int32"
+    if "cblas" in target.libs:

Review comment:
       We can simply check if `strategy.specializations` is empty at the end of this function.

##########
File path: include/tvm/relay/attrs/nn.h
##########
@@ -961,6 +961,32 @@ struct AvgPool3DAttrs : public tvm::AttrsNode<AvgPool3DAttrs> {
   }
 };
 
+/*! \brief Attributes for matmul operator */
+struct MatmulAttrs : public tvm::AttrsNode<MatmulAttrs> {
+  IndexExpr units;
+  DataType out_dtype;
+  bool data_transposed;
+  bool weight_transposed;
+  tvm::String auto_scheduler_rewritten_layout;  // The layout after auto-scheduler's layout rewrite

Review comment:
       This is used by auto-scheduler since it may want to rewrite the layout after applying a schedule. It's not the best solution but cleanest workaround for now, and it has been used by many ops already if you search "auto_scheduler_rewritten_layout" in the code base.

##########
File path: python/tvm/relay/frontend/tensorflow.py
##########
@@ -1204,7 +1208,7 @@ def from_tensorflow(self, graph, layout="NHWC", shape=None, outputs=None):
         return func, self._params
 
 
-def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None):
+def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None, use_dense_op=True):

Review comment:
       This PR already uses dense schedule for matmul_nt in the case of lowering to TOPI. On the other hand, as @jcf94 mentioned in the PR comment, doing so will affect much more places in the codebase and we better gradually convert them instead of in a single PR. It sounds reasonable to me.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] comaniac commented on a change in pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
comaniac commented on a change in pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#discussion_r659979731



##########
File path: python/tvm/relay/op/nn/_nn.py
##########
@@ -1160,21 +1186,46 @@ def batch_flatten_shape_func(attrs, inputs, _):
 
 
 @script
-def _dense_shape_func(data_shape, weight_shape):
+def _matmul_shape_func(data_shape, weight_shape, data_transposed, weight_transposed):

Review comment:
       nit: the two inputs of matmul are not necessary to be `data` and `weight`, although it's almost right in DNN. We may consider calling them `transpose_a` and `transpose_b` as CuDNN.

##########
File path: python/tvm/relay/frontend/tensorflow.py
##########
@@ -44,6 +44,10 @@
 
 __all__ = ["from_tensorflow"]
 
+# By default, TVM convert `tf.matmul` to `nn.dense` op with data tensor non-transposed and weight
+# tensor transposed
+_USE_DENSE_INSTEAD_OF_MATMUL = True

Review comment:
       ```suggestion
   # The default configurations of Relay TensorFlow frontend.
   TF_DEFAULT_CONFIGS = {
     # By default, TVM converts `tf.matmul` to `transpose(weight) + nn.dense`, which introduces
     # unnecessary overhead in weight transpose. Change this flag to False to directly convert to
     #`nn.matmul` to get rid of the overhead. However, please note that `nn.matmul` is in experimental
     # so it may have some performance issues.
     "use_dense": True,
   }
   ```
   
   I reviewed the primary entrypoint and feel we may need to make it more general, as we may have other configurations in the future. Also I used the name "default" to imply that it may be overwritten by user-specified values.

##########
File path: python/tvm/topi/x86/dense.py
##########
@@ -281,72 +281,121 @@ def _callback(op):
     return s
 
 
-def dense_blas_common(cfg, data, weight, bias, out_dtype, lib):
-    """Compute dense using a BLAS library"""
+def matmul_blas_common(cfg, data, weight, bias, out_dtype, data_transposed, weight_transposed, lib):
+    """Compute matmul/dense using a BLAS library"""
     M, K = get_const_tuple(data.shape)
     N, _ = get_const_tuple(weight.shape)
     if isinstance(M, int) and isinstance(K, int) and isinstance(N, int):
         cfg.add_flop(M * K * N * 2)
     if data.dtype == "uint8" and weight.dtype == "int8" and out_dtype == "int32":
         if not hasattr(lib, "matmul_u8s8s32"):
             raise NotImplementedError(
-                f"Dense with {lib.__name__} for {data.dtype} is not supported "
+                f"Matmul/Dense with {lib.__name__} for {data.dtype} is not supported "
                 "(matmulu8s8s32 not imlemented)"
             )
-        C = lib.matmul_u8s8s32(data, weight, False, True, dtype=out_dtype)
+        C = lib.matmul_u8s8s32(data, weight, data_transposed, weight_transposed, dtype=out_dtype)
     elif data.dtype == "float32" or data.dtype == "float64":
-        C = lib.matmul(data, weight, False, True)
+        C = lib.matmul(data, weight, data_transposed, weight_transposed)
     else:
-        raise NotImplementedError(f"Dense with {lib.__name__} for {data.dtype} is not supported")
+        raise NotImplementedError(
+            f"Matmul/Dense with {lib.__name__} for {data.dtype} is not supported"
+        )
 
     if bias is not None:
         C = te.compute(C.shape, lambda i, j: C[i, j] + bias[j].astype(out_dtype), tag=tag.BROADCAST)
     return C
 
 
+def schedule_matmul_blas_common(outs):
+    """Default matmul schedule for BLAS library"""
+    s = te.create_schedule([x.op for x in outs])
+    te.schedule.AutoInlineInjective(s)
+
+    for out in outs:
+        if "dense" not in out.op.tag and "matmul" not in out.op.tag:
+            schedule_injective_from_existing(s, out)
+    return s
+
+
 @autotvm.register_topi_compute("dense_cblas.x86")
 def dense_cblas(cfg, data, weight, bias=None, out_dtype=None):
     """Compute dense using a cblas"""

Review comment:
       ditto

##########
File path: python/tvm/topi/x86/dense.py
##########
@@ -281,72 +281,121 @@ def _callback(op):
     return s
 
 
-def dense_blas_common(cfg, data, weight, bias, out_dtype, lib):
-    """Compute dense using a BLAS library"""
+def matmul_blas_common(cfg, data, weight, bias, out_dtype, data_transposed, weight_transposed, lib):
+    """Compute matmul/dense using a BLAS library"""
     M, K = get_const_tuple(data.shape)
     N, _ = get_const_tuple(weight.shape)
     if isinstance(M, int) and isinstance(K, int) and isinstance(N, int):
         cfg.add_flop(M * K * N * 2)
     if data.dtype == "uint8" and weight.dtype == "int8" and out_dtype == "int32":
         if not hasattr(lib, "matmul_u8s8s32"):
             raise NotImplementedError(
-                f"Dense with {lib.__name__} for {data.dtype} is not supported "
+                f"Matmul/Dense with {lib.__name__} for {data.dtype} is not supported "
                 "(matmulu8s8s32 not imlemented)"
             )
-        C = lib.matmul_u8s8s32(data, weight, False, True, dtype=out_dtype)
+        C = lib.matmul_u8s8s32(data, weight, data_transposed, weight_transposed, dtype=out_dtype)
     elif data.dtype == "float32" or data.dtype == "float64":
-        C = lib.matmul(data, weight, False, True)
+        C = lib.matmul(data, weight, data_transposed, weight_transposed)
     else:
-        raise NotImplementedError(f"Dense with {lib.__name__} for {data.dtype} is not supported")
+        raise NotImplementedError(
+            f"Matmul/Dense with {lib.__name__} for {data.dtype} is not supported"
+        )
 
     if bias is not None:
         C = te.compute(C.shape, lambda i, j: C[i, j] + bias[j].astype(out_dtype), tag=tag.BROADCAST)
     return C
 
 
+def schedule_matmul_blas_common(outs):
+    """Default matmul schedule for BLAS library"""
+    s = te.create_schedule([x.op for x in outs])
+    te.schedule.AutoInlineInjective(s)
+
+    for out in outs:
+        if "dense" not in out.op.tag and "matmul" not in out.op.tag:
+            schedule_injective_from_existing(s, out)
+    return s
+
+
 @autotvm.register_topi_compute("dense_cblas.x86")
 def dense_cblas(cfg, data, weight, bias=None, out_dtype=None):
     """Compute dense using a cblas"""
-    return dense_blas_common(cfg, data, weight, bias, out_dtype, cblas)
+    return matmul_blas_common(cfg, data, weight, bias, out_dtype, False, True, cblas)
 
 
 @autotvm.register_topi_schedule("dense_cblas.x86")
 def schedule_dense_cblas(_, outs):
     """Create schedule for dense_cblas"""
-    return generic.schedule_extern(outs)
+    return schedule_matmul_blas_common(outs)
 
 
 @autotvm.register_topi_compute("dense_mkl.x86")
 def dense_mkl(cfg, data, weight, bias=None, out_dtype=None):
     """Compute dense using mkl"""
-    return dense_blas_common(cfg, data, weight, bias, out_dtype, mkl)
+    return matmul_blas_common(cfg, data, weight, bias, out_dtype, False, True, mkl)
 
 
 @autotvm.register_topi_schedule("dense_mkl.x86")
 def schedule_dense_mkl(_, outs):
     """Create schedule for dense_mkl"""
-    # return generic.schedule_extern(outs)
-    s = te.create_schedule([x.op for x in outs])
-    te.schedule.AutoInlineInjective(s)
-
-    def _callback(op):
-        if "broadcast" in op.tag or "injective" in op.tag or "elemwise" in op.tag:
-            schedule_injective_from_existing(s, op.output(0))
-
-    # traverse_inline(s, outs[0].op, _callback)
-    for out in outs:
-        if "dense" not in out.op.name:
-            schedule_injective_from_existing(s, out)
-    return s
+    return schedule_matmul_blas_common(outs)
 
 
 @autotvm.register_topi_compute("dense_mkldnn.x86")
 def dense_mkldnn(cfg, data, weight, bias=None, out_dtype=None):
     """Compute dense using mkldnn"""

Review comment:
       ditto

##########
File path: python/tvm/topi/x86/dense.py
##########
@@ -281,72 +281,121 @@ def _callback(op):
     return s
 
 
-def dense_blas_common(cfg, data, weight, bias, out_dtype, lib):
-    """Compute dense using a BLAS library"""
+def matmul_blas_common(cfg, data, weight, bias, out_dtype, data_transposed, weight_transposed, lib):
+    """Compute matmul/dense using a BLAS library"""
     M, K = get_const_tuple(data.shape)
     N, _ = get_const_tuple(weight.shape)
     if isinstance(M, int) and isinstance(K, int) and isinstance(N, int):
         cfg.add_flop(M * K * N * 2)
     if data.dtype == "uint8" and weight.dtype == "int8" and out_dtype == "int32":
         if not hasattr(lib, "matmul_u8s8s32"):
             raise NotImplementedError(
-                f"Dense with {lib.__name__} for {data.dtype} is not supported "
+                f"Matmul/Dense with {lib.__name__} for {data.dtype} is not supported "
                 "(matmulu8s8s32 not imlemented)"
             )
-        C = lib.matmul_u8s8s32(data, weight, False, True, dtype=out_dtype)
+        C = lib.matmul_u8s8s32(data, weight, data_transposed, weight_transposed, dtype=out_dtype)
     elif data.dtype == "float32" or data.dtype == "float64":
-        C = lib.matmul(data, weight, False, True)
+        C = lib.matmul(data, weight, data_transposed, weight_transposed)
     else:
-        raise NotImplementedError(f"Dense with {lib.__name__} for {data.dtype} is not supported")
+        raise NotImplementedError(
+            f"Matmul/Dense with {lib.__name__} for {data.dtype} is not supported"
+        )
 
     if bias is not None:
         C = te.compute(C.shape, lambda i, j: C[i, j] + bias[j].astype(out_dtype), tag=tag.BROADCAST)
     return C
 
 
+def schedule_matmul_blas_common(outs):
+    """Default matmul schedule for BLAS library"""
+    s = te.create_schedule([x.op for x in outs])
+    te.schedule.AutoInlineInjective(s)
+
+    for out in outs:
+        if "dense" not in out.op.tag and "matmul" not in out.op.tag:
+            schedule_injective_from_existing(s, out)
+    return s
+
+
 @autotvm.register_topi_compute("dense_cblas.x86")
 def dense_cblas(cfg, data, weight, bias=None, out_dtype=None):
     """Compute dense using a cblas"""
-    return dense_blas_common(cfg, data, weight, bias, out_dtype, cblas)
+    return matmul_blas_common(cfg, data, weight, bias, out_dtype, False, True, cblas)
 
 
 @autotvm.register_topi_schedule("dense_cblas.x86")
 def schedule_dense_cblas(_, outs):
     """Create schedule for dense_cblas"""
-    return generic.schedule_extern(outs)
+    return schedule_matmul_blas_common(outs)
 
 
 @autotvm.register_topi_compute("dense_mkl.x86")
 def dense_mkl(cfg, data, weight, bias=None, out_dtype=None):
     """Compute dense using mkl"""
-    return dense_blas_common(cfg, data, weight, bias, out_dtype, mkl)
+    return matmul_blas_common(cfg, data, weight, bias, out_dtype, False, True, mkl)
 
 
 @autotvm.register_topi_schedule("dense_mkl.x86")
 def schedule_dense_mkl(_, outs):
     """Create schedule for dense_mkl"""
-    # return generic.schedule_extern(outs)
-    s = te.create_schedule([x.op for x in outs])
-    te.schedule.AutoInlineInjective(s)
-
-    def _callback(op):
-        if "broadcast" in op.tag or "injective" in op.tag or "elemwise" in op.tag:
-            schedule_injective_from_existing(s, op.output(0))
-
-    # traverse_inline(s, outs[0].op, _callback)
-    for out in outs:
-        if "dense" not in out.op.name:
-            schedule_injective_from_existing(s, out)
-    return s
+    return schedule_matmul_blas_common(outs)
 
 
 @autotvm.register_topi_compute("dense_mkldnn.x86")
 def dense_mkldnn(cfg, data, weight, bias=None, out_dtype=None):
     """Compute dense using mkldnn"""
-    return dense_blas_common(cfg, data, weight, bias, out_dtype, mkldnn)
+    return matmul_blas_common(cfg, data, weight, bias, out_dtype, False, True, mkldnn)
 
 
 @autotvm.register_topi_schedule("dense_mkldnn.x86")
 def schedule_dense_mkldnn(_, outs):
     """Create schedule for dense_mkldnn"""

Review comment:
       ditto

##########
File path: python/tvm/topi/nn/dense.py
##########
@@ -51,37 +65,120 @@ def dense(data, weight, bias=None, out_dtype=None, auto_scheduler_rewritten_layo
         assert len(bias.shape) == 1
     if out_dtype is None:
         out_dtype = data.dtype
-    batch, in_dim = data.shape
+    if data_transposed:
+        in_dim, batch = data.shape
+    else:
+        batch, in_dim = data.shape
 
     if auto_scheduler_rewritten_layout:
         # Infer shape for the rewritten layout
         out_dim, red_dim = auto_scheduler.get_shape_from_rewritten_layout(
-            auto_scheduler_rewritten_layout, ["j", "k"]
+            auto_scheduler_rewritten_layout, ["j", "k"] if weight_transposed else ["k", "j"]
         )
         auto_scheduler.remove_index_check(weight)
-    else:
+    elif weight_transposed:
         out_dim, red_dim = weight.shape
+    else:
+        red_dim, out_dim = weight.shape
     assert in_dim == red_dim
 
     k = te.reduce_axis((0, in_dim), name="k")
-    matmul = te.compute(
+    if data_transposed:
+        if weight_transposed:
+            compute_lambda = lambda i, j: te.sum(
+                data[k, i].astype(out_dtype) * weight[j, k].astype(out_dtype), axis=k
+            )
+            compute_name = "T_matmul_TT"
+        else:
+            compute_lambda = lambda i, j: te.sum(
+                data[k, i].astype(out_dtype) * weight[k, j].astype(out_dtype), axis=k
+            )
+            compute_name = "T_matmul_TN"
+        compute_tag = "matmul"
+    else:
+        if weight_transposed:
+            compute_lambda = lambda i, j: te.sum(
+                data[i, k].astype(out_dtype) * weight[j, k].astype(out_dtype), axis=k
+            )
+            compute_name = "T_dense"
+            compute_tag = "dense"
+        else:
+            compute_lambda = lambda i, j: te.sum(
+                data[i, k].astype(out_dtype) * weight[k, j].astype(out_dtype), axis=k
+            )
+            compute_name = "T_matmul"
+            compute_tag = "matmul"
+
+    mat = te.compute(
         (batch, out_dim),
-        lambda i, j: te.sum(data[i, k].astype(out_dtype) * weight[j, k].astype(out_dtype), axis=k),
-        name="T_dense",
-        tag="dense",
+        compute_lambda,
+        name=compute_name,
+        tag=compute_tag,
         attrs={"layout_free_placeholders": [weight]},
     )
+
     if bias is not None:
-        matmul = te.compute(
+        mat = te.compute(
             (batch, out_dim),
-            lambda i, j: matmul[i, j] + bias[j].astype(out_dtype),
+            lambda i, j: mat[i, j] + bias[j].astype(out_dtype),
             tag=tag.BROADCAST,
         )
 
     if auto_scheduler_rewritten_layout:
-        matmul = auto_scheduler.rewrite_compute_body(matmul, auto_scheduler_rewritten_layout)
+        mat = auto_scheduler.rewrite_compute_body(mat, auto_scheduler_rewritten_layout)
+
+    return mat
+
 
-    return matmul
+@tvm.target.generic_func
+def matmul_legalize(attrs, inputs, types):
+    """Legalizes matmul op.
+
+    Parameters
+    ----------
+    attrs : tvm.ir.Attrs
+        Attributes of current matmul
+    inputs : list of tvm.relay.Expr
+        The args of the Relay expr to be legalized
+    types : list of types
+        List of input and output types
+
+    Returns
+    -------
+    result : tvm.relay.Expr
+        The legalized expr
+    """
+    # not to change by default
+    # pylint: disable=unused-argument
+    return None
+
+
+def dense(data, weight, bias=None, out_dtype=None, auto_scheduler_rewritten_layout=""):
+    """The default implementation of dense in topi.

Review comment:
       ```suggestion
       """The default implementation of dense in topi. It is identical to matmul_nt.
   ```

##########
File path: python/tvm/topi/cuda/dense.py
##########
@@ -77,6 +111,36 @@ def _callback(op):
     return s
 
 
+@autotvm.register_topi_compute("matmul_default.cuda")
+def matmul_default_cuda(
+    cfg,
+    data,
+    weight,
+    bias=None,
+    out_dtype=None,
+    data_transposed=False,
+    weight_transposed=False,
+):
+    """Matmul operator on cuda"""

Review comment:
       ditto

##########
File path: python/tvm/relay/op/strategy/cuda.py
##########
@@ -698,6 +698,36 @@ def conv1d_transpose_strategy_cuda(attrs, inputs, out_type, target):
     return strategy
 
 
+@matmul_strategy.register(["cuda", "gpu"])
+def matmul_strategy_cuda(attrs, inputs, out_type, target):
+    """Matmul cuda strategy"""
+    strategy = _op.OpStrategy()
+
+    if is_auto_scheduler_enabled():
+        strategy.add_implementation(
+            wrap_compute_matmul(topi.nn.matmul),
+            naive_schedule,
+            name="matmul.cuda",
+        )
+    else:
+        logger.warning("Matmul other than NT format is not optimized for cuda.")

Review comment:
       - People may not know what "NT" is, as we didn't use this term in matmul op.
   - We may recommend using cublas in this case.
   

##########
File path: python/tvm/relay/op/strategy/x86.py
##########
@@ -370,6 +370,55 @@ def conv1d_strategy_cpu(attrs, inputs, out_type, target):
     return strategy
 
 
+@matmul_strategy.register("cpu")
+def matmul_strategy_cpu(attrs, inputs, out_type, target):
+    """matmul x86 strategy"""
+    strategy = _op.OpStrategy()
+    if is_auto_scheduler_enabled():
+        strategy.add_implementation(
+            wrap_compute_matmul(topi.nn.matmul, need_auto_scheduler_layout=True),
+            naive_schedule,
+            name="matmul.generic",
+            plevel=11,
+        )
+    else:
+        logger.warning("Matmul other than NT format is not optimized for x86.")

Review comment:
       ditto.

##########
File path: python/tvm/relay/frontend/tensorflow.py
##########
@@ -1230,6 +1239,9 @@ def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None):
     params : dict of str to tvm.nd.NDArray
         Dict of converted parameters stored in tvm.nd.NDArray format
     """
+    global _USE_DENSE_INSTEAD_OF_MATMUL

Review comment:
       I feel this is confusing too. If `_USE_DENSE_INSTEAD_OF_MATMUL` is not supposed to be changed by users directly, we should improve the comments of this global variable. Please see my comment at the global variable.
   
   btw, in this case we can simply `_USE_DENSE_INSTEAD_OF_MATMUL = use_dense_op` without checking if they are the same or not.

##########
File path: python/tvm/topi/x86/dense.py
##########
@@ -281,72 +281,121 @@ def _callback(op):
     return s
 
 
-def dense_blas_common(cfg, data, weight, bias, out_dtype, lib):
-    """Compute dense using a BLAS library"""
+def matmul_blas_common(cfg, data, weight, bias, out_dtype, data_transposed, weight_transposed, lib):
+    """Compute matmul/dense using a BLAS library"""
     M, K = get_const_tuple(data.shape)
     N, _ = get_const_tuple(weight.shape)
     if isinstance(M, int) and isinstance(K, int) and isinstance(N, int):
         cfg.add_flop(M * K * N * 2)
     if data.dtype == "uint8" and weight.dtype == "int8" and out_dtype == "int32":
         if not hasattr(lib, "matmul_u8s8s32"):
             raise NotImplementedError(
-                f"Dense with {lib.__name__} for {data.dtype} is not supported "
+                f"Matmul/Dense with {lib.__name__} for {data.dtype} is not supported "
                 "(matmulu8s8s32 not imlemented)"
             )
-        C = lib.matmul_u8s8s32(data, weight, False, True, dtype=out_dtype)
+        C = lib.matmul_u8s8s32(data, weight, data_transposed, weight_transposed, dtype=out_dtype)
     elif data.dtype == "float32" or data.dtype == "float64":
-        C = lib.matmul(data, weight, False, True)
+        C = lib.matmul(data, weight, data_transposed, weight_transposed)
     else:
-        raise NotImplementedError(f"Dense with {lib.__name__} for {data.dtype} is not supported")
+        raise NotImplementedError(
+            f"Matmul/Dense with {lib.__name__} for {data.dtype} is not supported"
+        )
 
     if bias is not None:
         C = te.compute(C.shape, lambda i, j: C[i, j] + bias[j].astype(out_dtype), tag=tag.BROADCAST)
     return C
 
 
+def schedule_matmul_blas_common(outs):
+    """Default matmul schedule for BLAS library"""
+    s = te.create_schedule([x.op for x in outs])
+    te.schedule.AutoInlineInjective(s)
+
+    for out in outs:
+        if "dense" not in out.op.tag and "matmul" not in out.op.tag:
+            schedule_injective_from_existing(s, out)
+    return s
+
+
 @autotvm.register_topi_compute("dense_cblas.x86")
 def dense_cblas(cfg, data, weight, bias=None, out_dtype=None):
     """Compute dense using a cblas"""
-    return dense_blas_common(cfg, data, weight, bias, out_dtype, cblas)
+    return matmul_blas_common(cfg, data, weight, bias, out_dtype, False, True, cblas)
 
 
 @autotvm.register_topi_schedule("dense_cblas.x86")
 def schedule_dense_cblas(_, outs):
     """Create schedule for dense_cblas"""
-    return generic.schedule_extern(outs)
+    return schedule_matmul_blas_common(outs)
 
 
 @autotvm.register_topi_compute("dense_mkl.x86")
 def dense_mkl(cfg, data, weight, bias=None, out_dtype=None):
     """Compute dense using mkl"""

Review comment:
       ditto

##########
File path: python/tvm/topi/cuda/dense.py
##########
@@ -51,6 +57,34 @@ def dense_cublas(cfg, data, weight, bias=None, out_dtype=None):
     return matmul
 
 
+@autotvm.register_topi_compute("matmul_cublas.cuda")
+def matmul_cublas(
+    cfg,
+    data,
+    weight,
+    bias=None,
+    out_dtype=None,
+    data_transposed=False,
+    weight_transposed=False,
+):
+    """Matmul operator on CUDA with CUBLAS"""
+    return _matmul_cublas_common(
+        cfg, data, weight, bias, out_dtype, data_transposed, weight_transposed
+    )
+
+
+@autotvm.register_topi_schedule("matmul_cublas.cuda")
+def schedule_matmul_cublas(_, outs):
+    """Schedule matmul operator using CUBLAS"""
+    return generic.schedule_extern(outs)
+
+
+@autotvm.register_topi_compute("dense_cublas.cuda")
+def dense_cublas(cfg, data, weight, bias=None, out_dtype=None):
+    """Dense operator on CUDA with CUBLAS"""

Review comment:
       ```suggestion
       """Dense operator on CUDA with CUBLAS. This is an alias of matmul_nt operator"""
   ```
   Please add a note to all other similar places saying the dense implementation is indeitcal to matmul_nt.

##########
File path: python/tvm/topi/x86/dense.py
##########
@@ -281,72 +281,121 @@ def _callback(op):
     return s
 
 
-def dense_blas_common(cfg, data, weight, bias, out_dtype, lib):
-    """Compute dense using a BLAS library"""
+def matmul_blas_common(cfg, data, weight, bias, out_dtype, data_transposed, weight_transposed, lib):
+    """Compute matmul/dense using a BLAS library"""
     M, K = get_const_tuple(data.shape)
     N, _ = get_const_tuple(weight.shape)
     if isinstance(M, int) and isinstance(K, int) and isinstance(N, int):
         cfg.add_flop(M * K * N * 2)
     if data.dtype == "uint8" and weight.dtype == "int8" and out_dtype == "int32":
         if not hasattr(lib, "matmul_u8s8s32"):
             raise NotImplementedError(
-                f"Dense with {lib.__name__} for {data.dtype} is not supported "
+                f"Matmul/Dense with {lib.__name__} for {data.dtype} is not supported "
                 "(matmulu8s8s32 not imlemented)"
             )
-        C = lib.matmul_u8s8s32(data, weight, False, True, dtype=out_dtype)
+        C = lib.matmul_u8s8s32(data, weight, data_transposed, weight_transposed, dtype=out_dtype)
     elif data.dtype == "float32" or data.dtype == "float64":
-        C = lib.matmul(data, weight, False, True)
+        C = lib.matmul(data, weight, data_transposed, weight_transposed)
     else:
-        raise NotImplementedError(f"Dense with {lib.__name__} for {data.dtype} is not supported")
+        raise NotImplementedError(
+            f"Matmul/Dense with {lib.__name__} for {data.dtype} is not supported"
+        )
 
     if bias is not None:
         C = te.compute(C.shape, lambda i, j: C[i, j] + bias[j].astype(out_dtype), tag=tag.BROADCAST)
     return C
 
 
+def schedule_matmul_blas_common(outs):
+    """Default matmul schedule for BLAS library"""
+    s = te.create_schedule([x.op for x in outs])
+    te.schedule.AutoInlineInjective(s)
+
+    for out in outs:
+        if "dense" not in out.op.tag and "matmul" not in out.op.tag:
+            schedule_injective_from_existing(s, out)
+    return s
+
+
 @autotvm.register_topi_compute("dense_cblas.x86")
 def dense_cblas(cfg, data, weight, bias=None, out_dtype=None):
     """Compute dense using a cblas"""
-    return dense_blas_common(cfg, data, weight, bias, out_dtype, cblas)
+    return matmul_blas_common(cfg, data, weight, bias, out_dtype, False, True, cblas)
 
 
 @autotvm.register_topi_schedule("dense_cblas.x86")
 def schedule_dense_cblas(_, outs):
     """Create schedule for dense_cblas"""

Review comment:
       ditto

##########
File path: python/tvm/relay/op/nn/nn.py
##########
@@ -1471,6 +1471,47 @@ def bias_add(data, bias, axis=1):
     return _make.bias_add(data, bias, axis)
 
 
+def matmul(data, weight, units=None, out_dtype="", data_transposed=False, weight_transposed=False):
+    """Matmul operator.
+    Applies a linear transformation. The X & W can be transposed.
+
+    .. math::
+
+        `Y = X * W`
+
+    Parameters
+    ----------
+    data : tvm.relay.Expr
+        The input data to the operator,
+        of shape `(d_1, d_2, ..., d_n, units_in)` or `(d_1, d_2, ..., units_in, d_n)`.
+
+    weight : tvm.relay.Expr
+        The weight expressions, 2-D matrix,
+        of shape `(units_in, units)` or `(units, units_in)`.
+
+    units : int, optional
+        Number of hidden units of the matmul transformation.
+
+    out_dtype : str, optional
+        Specifies the output data type for mixed precision dense,
+        of shape `(d_1, d_2, ..., d_n, units)`.
+
+    data_transposed : bool, optional
+        Whether the data tensor is in transposed format.
+
+    weight_transposed : bool, optional
+        Whether the weight tensor is in transposed format.
+
+    Returns
+    -------
+    result : tvm.relay.Expr
+        The computed result.
+    """
+    if not data_transposed and weight_transposed:
+        return dense(data, weight, units, out_dtype)

Review comment:
       Add a TODO saying that this will be removed once dense has been simplified?

##########
File path: python/tvm/relay/op/strategy/x86.py
##########
@@ -370,6 +370,55 @@ def conv1d_strategy_cpu(attrs, inputs, out_type, target):
     return strategy
 
 
+@matmul_strategy.register("cpu")
+def matmul_strategy_cpu(attrs, inputs, out_type, target):
+    """matmul x86 strategy"""
+    strategy = _op.OpStrategy()
+    if is_auto_scheduler_enabled():
+        strategy.add_implementation(
+            wrap_compute_matmul(topi.nn.matmul, need_auto_scheduler_layout=True),
+            naive_schedule,
+            name="matmul.generic",
+            plevel=11,
+        )
+    else:
+        logger.warning("Matmul other than NT format is not optimized for x86.")
+        strategy.add_implementation(
+            wrap_compute_matmul(topi.nn.matmul),
+            naive_schedule,
+            name="matmul.generic",
+        )
+
+    same_type = inputs[0].dtype == inputs[1].dtype == out_type.dtype
+    dtype = inputs[0].dtype
+    u8s8s32 = dtype == "uint8" and inputs[1].dtype == "int8" and out_type.dtype == "int32"
+    if "cblas" in target.libs:

Review comment:
       IMHO, if users specified kernel library in their target, but we cannot use the library in this matmul, it might be better to throw a warning.

##########
File path: python/tvm/topi/nn/dense.py
##########
@@ -51,37 +65,120 @@ def dense(data, weight, bias=None, out_dtype=None, auto_scheduler_rewritten_layo
         assert len(bias.shape) == 1
     if out_dtype is None:
         out_dtype = data.dtype
-    batch, in_dim = data.shape
+    if data_transposed:
+        in_dim, batch = data.shape
+    else:
+        batch, in_dim = data.shape
 
     if auto_scheduler_rewritten_layout:
         # Infer shape for the rewritten layout
         out_dim, red_dim = auto_scheduler.get_shape_from_rewritten_layout(
-            auto_scheduler_rewritten_layout, ["j", "k"]
+            auto_scheduler_rewritten_layout, ["j", "k"] if weight_transposed else ["k", "j"]
         )
         auto_scheduler.remove_index_check(weight)
-    else:
+    elif weight_transposed:
         out_dim, red_dim = weight.shape
+    else:
+        red_dim, out_dim = weight.shape
     assert in_dim == red_dim
 
     k = te.reduce_axis((0, in_dim), name="k")
-    matmul = te.compute(
+    if data_transposed:
+        if weight_transposed:
+            compute_lambda = lambda i, j: te.sum(
+                data[k, i].astype(out_dtype) * weight[j, k].astype(out_dtype), axis=k
+            )
+            compute_name = "T_matmul_TT"
+        else:
+            compute_lambda = lambda i, j: te.sum(
+                data[k, i].astype(out_dtype) * weight[k, j].astype(out_dtype), axis=k
+            )
+            compute_name = "T_matmul_TN"
+        compute_tag = "matmul"
+    else:
+        if weight_transposed:
+            compute_lambda = lambda i, j: te.sum(
+                data[i, k].astype(out_dtype) * weight[j, k].astype(out_dtype), axis=k
+            )
+            compute_name = "T_dense"

Review comment:
       @jcf94 please discuss as I have the same issue above.

##########
File path: python/tvm/topi/x86/dense.py
##########
@@ -281,72 +281,121 @@ def _callback(op):
     return s
 
 
-def dense_blas_common(cfg, data, weight, bias, out_dtype, lib):
-    """Compute dense using a BLAS library"""
+def matmul_blas_common(cfg, data, weight, bias, out_dtype, data_transposed, weight_transposed, lib):
+    """Compute matmul/dense using a BLAS library"""
     M, K = get_const_tuple(data.shape)
     N, _ = get_const_tuple(weight.shape)
     if isinstance(M, int) and isinstance(K, int) and isinstance(N, int):
         cfg.add_flop(M * K * N * 2)
     if data.dtype == "uint8" and weight.dtype == "int8" and out_dtype == "int32":
         if not hasattr(lib, "matmul_u8s8s32"):
             raise NotImplementedError(
-                f"Dense with {lib.__name__} for {data.dtype} is not supported "
+                f"Matmul/Dense with {lib.__name__} for {data.dtype} is not supported "
                 "(matmulu8s8s32 not imlemented)"
             )
-        C = lib.matmul_u8s8s32(data, weight, False, True, dtype=out_dtype)
+        C = lib.matmul_u8s8s32(data, weight, data_transposed, weight_transposed, dtype=out_dtype)
     elif data.dtype == "float32" or data.dtype == "float64":
-        C = lib.matmul(data, weight, False, True)
+        C = lib.matmul(data, weight, data_transposed, weight_transposed)
     else:
-        raise NotImplementedError(f"Dense with {lib.__name__} for {data.dtype} is not supported")
+        raise NotImplementedError(
+            f"Matmul/Dense with {lib.__name__} for {data.dtype} is not supported"
+        )
 
     if bias is not None:
         C = te.compute(C.shape, lambda i, j: C[i, j] + bias[j].astype(out_dtype), tag=tag.BROADCAST)
     return C
 
 
+def schedule_matmul_blas_common(outs):
+    """Default matmul schedule for BLAS library"""
+    s = te.create_schedule([x.op for x in outs])
+    te.schedule.AutoInlineInjective(s)
+
+    for out in outs:
+        if "dense" not in out.op.tag and "matmul" not in out.op.tag:
+            schedule_injective_from_existing(s, out)
+    return s
+
+
 @autotvm.register_topi_compute("dense_cblas.x86")
 def dense_cblas(cfg, data, weight, bias=None, out_dtype=None):
     """Compute dense using a cblas"""
-    return dense_blas_common(cfg, data, weight, bias, out_dtype, cblas)
+    return matmul_blas_common(cfg, data, weight, bias, out_dtype, False, True, cblas)
 
 
 @autotvm.register_topi_schedule("dense_cblas.x86")
 def schedule_dense_cblas(_, outs):
     """Create schedule for dense_cblas"""
-    return generic.schedule_extern(outs)
+    return schedule_matmul_blas_common(outs)
 
 
 @autotvm.register_topi_compute("dense_mkl.x86")
 def dense_mkl(cfg, data, weight, bias=None, out_dtype=None):
     """Compute dense using mkl"""
-    return dense_blas_common(cfg, data, weight, bias, out_dtype, mkl)
+    return matmul_blas_common(cfg, data, weight, bias, out_dtype, False, True, mkl)
 
 
 @autotvm.register_topi_schedule("dense_mkl.x86")
 def schedule_dense_mkl(_, outs):
     """Create schedule for dense_mkl"""

Review comment:
       ditto

##########
File path: python/tvm/topi/nn/dense.py
##########
@@ -51,37 +65,120 @@ def dense(data, weight, bias=None, out_dtype=None, auto_scheduler_rewritten_layo
         assert len(bias.shape) == 1
     if out_dtype is None:
         out_dtype = data.dtype
-    batch, in_dim = data.shape
+    if data_transposed:
+        in_dim, batch = data.shape
+    else:
+        batch, in_dim = data.shape
 
     if auto_scheduler_rewritten_layout:
         # Infer shape for the rewritten layout
         out_dim, red_dim = auto_scheduler.get_shape_from_rewritten_layout(
-            auto_scheduler_rewritten_layout, ["j", "k"]
+            auto_scheduler_rewritten_layout, ["j", "k"] if weight_transposed else ["k", "j"]
         )
         auto_scheduler.remove_index_check(weight)
-    else:
+    elif weight_transposed:
         out_dim, red_dim = weight.shape
+    else:
+        red_dim, out_dim = weight.shape
     assert in_dim == red_dim
 
     k = te.reduce_axis((0, in_dim), name="k")
-    matmul = te.compute(
+    if data_transposed:
+        if weight_transposed:
+            compute_lambda = lambda i, j: te.sum(
+                data[k, i].astype(out_dtype) * weight[j, k].astype(out_dtype), axis=k
+            )
+            compute_name = "T_matmul_TT"
+        else:
+            compute_lambda = lambda i, j: te.sum(
+                data[k, i].astype(out_dtype) * weight[k, j].astype(out_dtype), axis=k
+            )
+            compute_name = "T_matmul_TN"
+        compute_tag = "matmul"
+    else:
+        if weight_transposed:
+            compute_lambda = lambda i, j: te.sum(
+                data[i, k].astype(out_dtype) * weight[j, k].astype(out_dtype), axis=k
+            )
+            compute_name = "T_dense"

Review comment:
       I personally vote for B.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jcf94 edited a comment on pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
jcf94 edited a comment on pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#issuecomment-859215400


   @tkonolige @comaniac @altanh Thanks!
   
   Except for these nit comments about name and coding style, I think currently the problem is still how to keep the existing `nn.dense` schedules for different devices working while adding a new `nn.matmul` op.
   
   @comaniac 's solution is exactly what I desired. But in the current op strategy, since all of these existing schedules are registered with `nn.dense`, we would still need a real `nn.dense` op.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jcf94 commented on a change in pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
jcf94 commented on a change in pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#discussion_r660229467



##########
File path: include/tvm/relay/attrs/nn.h
##########
@@ -961,6 +961,32 @@ struct AvgPool3DAttrs : public tvm::AttrsNode<AvgPool3DAttrs> {
   }
 };
 
+/*! \brief Attributes for matmul operator */
+struct MatmulAttrs : public tvm::AttrsNode<MatmulAttrs> {
+  IndexExpr units;
+  DataType out_dtype;
+  bool data_transposed;
+  bool weight_transposed;
+  tvm::String auto_scheduler_rewritten_layout;  // The layout after auto-scheduler's layout rewrite

Review comment:
       You mean `MatmulAttrs`? We're not able to remove all the `nn.dense` at this moment. So `nn.dense` and `nn.matmul` should still be two different ops now. They need different `Attrs`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] tqchen edited a comment on pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
tqchen edited a comment on pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#issuecomment-873407694






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] tqchen edited a comment on pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
tqchen edited a comment on pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#issuecomment-873407694


   @jcf94 There  a few regressions in PyTorch frontend and other places that might be related, can you take a look?  dense should map to matmul(A, B.T). 
   - https://discuss.tvm.apache.org/t/pytorch-unable-to-import-a-simple-torch-nn-linear-to-relay/10383
   - https://github.com/apache/tvm/issues/8392


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jcf94 commented on a change in pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
jcf94 commented on a change in pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#discussion_r660225637



##########
File path: python/tvm/relay/op/nn/nn.py
##########
@@ -1471,6 +1471,47 @@ def bias_add(data, bias, axis=1):
     return _make.bias_add(data, bias, axis)
 
 
+def matmul(data, weight, units=None, out_dtype="", data_transposed=False, weight_transposed=False):
+    """Matmul operator.
+    Applies a linear transformation. The X & W can be transposed.
+
+    .. math::
+
+        `Y = X * W`
+
+    Parameters
+    ----------
+    data : tvm.relay.Expr
+        The input data to the operator,
+        of shape `(d_1, d_2, ..., d_n, units_in)` or `(d_1, d_2, ..., units_in, d_n)`.

Review comment:
       No, the input of matmul is supposed to be a multiple-dim tensor(not limited to 2). This is copied from the original `nn.dense`.
   
   Other frameworks like Pytorch also has such definition.

##########
File path: python/tvm/relay/op/nn/_nn.py
##########
@@ -1160,21 +1186,46 @@ def batch_flatten_shape_func(attrs, inputs, _):
 
 
 @script
-def _dense_shape_func(data_shape, weight_shape):
+def _matmul_shape_func(data_shape, weight_shape, data_transposed, weight_transposed):
     out = output_tensor((data_shape.shape[0],), "int64")
     for i in const_range(out.shape[0] - 1):
         out[i] = data_shape[i]
-    out[out.shape[0] - 1] = weight_shape[0]
+    if data_transposed:
+        out[out.shape[0] - 2] = out[out.shape[0] - 1]
+    out[out.shape[0] - 1] = weight_shape[0] if weight_transposed else weight_shape[1]

Review comment:
       Since the dimension of data tensor can be more than 2, this is the simplest implementation to do so.

##########
File path: python/tvm/relay/op/nn/nn.py
##########
@@ -1471,6 +1471,47 @@ def bias_add(data, bias, axis=1):
     return _make.bias_add(data, bias, axis)
 
 
+def matmul(data, weight, units=None, out_dtype="", data_transposed=False, weight_transposed=False):
+    """Matmul operator.
+    Applies a linear transformation. The X & W can be transposed.
+
+    .. math::
+
+        `Y = X * W`
+
+    Parameters
+    ----------
+    data : tvm.relay.Expr
+        The input data to the operator,
+        of shape `(d_1, d_2, ..., d_n, units_in)` or `(d_1, d_2, ..., units_in, d_n)`.
+
+    weight : tvm.relay.Expr
+        The weight expressions, 2-D matrix,
+        of shape `(units_in, units)` or `(units, units_in)`.
+
+    units : int, optional
+        Number of hidden units of the matmul transformation.

Review comment:
       I think the doc has explained enough: "The hidden units." This is copied from the original `nn.dense`.

##########
File path: python/tvm/relay/frontend/tensorflow.py
##########
@@ -1204,7 +1208,7 @@ def from_tensorflow(self, graph, layout="NHWC", shape=None, outputs=None):
         return func, self._params
 
 
-def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None):
+def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None, use_dense_op=True):

Review comment:
       The problem is that we're not able to remove all the `nn.dense` at this moment and there's not enough AutoTVM template for `nn.matmul`.
   
   So the use of `nn.matmul` can only be seen as a experimental feature. We should not change the default behavior in case this may affect those who are using `nn.dense`.
   

##########
File path: include/tvm/relay/attrs/nn.h
##########
@@ -961,6 +961,32 @@ struct AvgPool3DAttrs : public tvm::AttrsNode<AvgPool3DAttrs> {
   }
 };
 
+/*! \brief Attributes for matmul operator */
+struct MatmulAttrs : public tvm::AttrsNode<MatmulAttrs> {
+  IndexExpr units;
+  DataType out_dtype;
+  bool data_transposed;
+  bool weight_transposed;
+  tvm::String auto_scheduler_rewritten_layout;  // The layout after auto-scheduler's layout rewrite

Review comment:
       You mean `MatmulAttrs`? We're not able to remove all the `nn.dense` at this moment. So `nn.dense` and `nn.matmul` should still be two different ops now. They need different `Attrs`.

##########
File path: python/tvm/topi/nn/dense.py
##########
@@ -51,37 +65,120 @@ def dense(data, weight, bias=None, out_dtype=None, auto_scheduler_rewritten_layo
         assert len(bias.shape) == 1
     if out_dtype is None:
         out_dtype = data.dtype
-    batch, in_dim = data.shape
+    if data_transposed:
+        in_dim, batch = data.shape
+    else:
+        batch, in_dim = data.shape
 
     if auto_scheduler_rewritten_layout:
         # Infer shape for the rewritten layout
         out_dim, red_dim = auto_scheduler.get_shape_from_rewritten_layout(
-            auto_scheduler_rewritten_layout, ["j", "k"]
+            auto_scheduler_rewritten_layout, ["j", "k"] if weight_transposed else ["k", "j"]
         )
         auto_scheduler.remove_index_check(weight)
-    else:
+    elif weight_transposed:
         out_dim, red_dim = weight.shape
+    else:
+        red_dim, out_dim = weight.shape
     assert in_dim == red_dim
 
     k = te.reduce_axis((0, in_dim), name="k")
-    matmul = te.compute(
+    if data_transposed:
+        if weight_transposed:
+            compute_lambda = lambda i, j: te.sum(
+                data[k, i].astype(out_dtype) * weight[j, k].astype(out_dtype), axis=k
+            )
+            compute_name = "T_matmul_TT"
+        else:
+            compute_lambda = lambda i, j: te.sum(
+                data[k, i].astype(out_dtype) * weight[k, j].astype(out_dtype), axis=k
+            )
+            compute_name = "T_matmul_TN"
+        compute_tag = "matmul"
+    else:
+        if weight_transposed:
+            compute_lambda = lambda i, j: te.sum(
+                data[i, k].astype(out_dtype) * weight[j, k].astype(out_dtype), axis=k
+            )
+            compute_name = "T_dense"

Review comment:
       I think its fine for it is just a op name. 😄  But the tag `dense` has been used in some schedule check, so I think we'd better keep that.
   
   There're some options I can come up with:
   - A: Use `T_dense` as name and `dense` as tag for NT format, use `T_matmul` as name and `matmul` as tag for all other 3 format.
   - B: Use `T_matmul_NN`, `T_matmul_NT`, `T_matmul_TN`, `T_matmul_TT` as name for each format, use `dense` as tag for NT format and `matmul` as tag for others.
   
   What do you think about?

##########
File path: python/tvm/relay/op/nn/_nn.py
##########
@@ -1160,21 +1186,46 @@ def batch_flatten_shape_func(attrs, inputs, _):
 
 
 @script
-def _dense_shape_func(data_shape, weight_shape):
+def _matmul_shape_func(data_shape, weight_shape, data_transposed, weight_transposed):

Review comment:
       Updated all `data_transposed` & `weight_transposed` to `transpose_a` & `transpose_b`. And also renamed all the `data` & `weight` in matmul to `tensor_a` & `tensor_b`. Tensor names in dense remain unchanged.

##########
File path: python/tvm/relay/op/strategy/x86.py
##########
@@ -370,6 +370,55 @@ def conv1d_strategy_cpu(attrs, inputs, out_type, target):
     return strategy
 
 
+@matmul_strategy.register("cpu")
+def matmul_strategy_cpu(attrs, inputs, out_type, target):
+    """matmul x86 strategy"""
+    strategy = _op.OpStrategy()
+    if is_auto_scheduler_enabled():
+        strategy.add_implementation(
+            wrap_compute_matmul(topi.nn.matmul, need_auto_scheduler_layout=True),
+            naive_schedule,
+            name="matmul.generic",
+            plevel=11,
+        )
+    else:
+        logger.warning("Matmul other than NT format is not optimized for x86.")
+        strategy.add_implementation(
+            wrap_compute_matmul(topi.nn.matmul),
+            naive_schedule,
+            name="matmul.generic",
+        )
+
+    same_type = inputs[0].dtype == inputs[1].dtype == out_type.dtype
+    dtype = inputs[0].dtype
+    u8s8s32 = dtype == "uint8" and inputs[1].dtype == "int8" and out_type.dtype == "int32"
+    if "cblas" in target.libs:

Review comment:
       I agree, but seems there is not an api for `SpecializedCondition` to process the False path?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jcf94 commented on a change in pull request #8234: [Matmul] Add matmul op

Posted by GitBox <gi...@apache.org>.
jcf94 commented on a change in pull request #8234:
URL: https://github.com/apache/tvm/pull/8234#discussion_r660233541



##########
File path: python/tvm/topi/nn/dense.py
##########
@@ -51,37 +65,120 @@ def dense(data, weight, bias=None, out_dtype=None, auto_scheduler_rewritten_layo
         assert len(bias.shape) == 1
     if out_dtype is None:
         out_dtype = data.dtype
-    batch, in_dim = data.shape
+    if data_transposed:
+        in_dim, batch = data.shape
+    else:
+        batch, in_dim = data.shape
 
     if auto_scheduler_rewritten_layout:
         # Infer shape for the rewritten layout
         out_dim, red_dim = auto_scheduler.get_shape_from_rewritten_layout(
-            auto_scheduler_rewritten_layout, ["j", "k"]
+            auto_scheduler_rewritten_layout, ["j", "k"] if weight_transposed else ["k", "j"]
         )
         auto_scheduler.remove_index_check(weight)
-    else:
+    elif weight_transposed:
         out_dim, red_dim = weight.shape
+    else:
+        red_dim, out_dim = weight.shape
     assert in_dim == red_dim
 
     k = te.reduce_axis((0, in_dim), name="k")
-    matmul = te.compute(
+    if data_transposed:
+        if weight_transposed:
+            compute_lambda = lambda i, j: te.sum(
+                data[k, i].astype(out_dtype) * weight[j, k].astype(out_dtype), axis=k
+            )
+            compute_name = "T_matmul_TT"
+        else:
+            compute_lambda = lambda i, j: te.sum(
+                data[k, i].astype(out_dtype) * weight[k, j].astype(out_dtype), axis=k
+            )
+            compute_name = "T_matmul_TN"
+        compute_tag = "matmul"
+    else:
+        if weight_transposed:
+            compute_lambda = lambda i, j: te.sum(
+                data[i, k].astype(out_dtype) * weight[j, k].astype(out_dtype), axis=k
+            )
+            compute_name = "T_dense"

Review comment:
       I think its fine for it is just a op name. 😄  But the tag `dense` has been used in some schedule check, so I think we'd better keep that.
   
   There're some options I can come up with:
   - A: Use `T_dense` as name and `dense` as tag for NT format, use `T_matmul` as name and `matmul` as tag for all other 3 format.
   - B: Use `T_matmul_NN`, `T_matmul_NT`, `T_matmul_TN`, `T_matmul_TT` as name for each format, use `dense` as tag for NT format and `matmul` as tag for others.
   
   What do you think about?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org