You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/01/07 02:03:50 UTC

[GitHub] [incubator-tvm] jwfromm opened a new pull request #4639: [Relay/Topi][Op] Conv1D

jwfromm opened a new pull request #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639
 
 
   This PR adds the Conv1D operation to Topi and Relay for both CPU and GPU. It supports both NCW and NWC layouts natively and is tested end to end in the Onnx frontend. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] masahi commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

masahi commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#discussion_r364752270
 
 

 ##########
 File path: topi/python/topi/generic/nn.py
 ##########
 @@ -34,6 +34,42 @@ def _default_schedule(outs, auto_inline):
     return s
 
 
+@tvm.target.generic_func
+def schedule_conv1d_ncw(outs):
+    """Schedule for conv1d_ncw
+
+    Parameters
+    ----------
+    outs: Array of Tensor
+          The computation graph description of conv2d_hwcn
+          in the format of an array of tensors.
+
+    Returns
+    -------
+    sch: Schedule
+        The computation schedule for the op.
+    """
+    return _default_schedule(outs, False)
+
+
+@tvm.target.generic_func
+def schedule_conv1d_nwc(outs):
+    """Schedule for conv1d_nwc
+
+    Parameters
+    ----------
+    outs: Array of Tensor
+          The computation graph description of conv2d_hwcn
 
 Review comment:
   needs doc fix

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] optima2005 commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

optima2005 commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#discussion_r365054371
 
 

 ##########
 File path: tests/python/frontend/onnx/test_forward.py
 ##########
 @@ -1732,22 +1732,34 @@ def test_or():
     verify_or(indata=[x, y], dtype=bool)
 
 
-def verify_conv(x_shape, w_shape, y_shape, p):
-    node = helper.make_node('Conv',
-                            inputs=['x', 'W'],
-                            outputs=['y'],
-                            kernel_shape=[3, 3],
-                            # Default values for other attributes:
-                            # strides=[1, 1],
-                            # dilations=[1, 1],
-                            # groups=1
-                            pads=p,)
+def verify_conv(x_shape, w_shape, y_shape, padding, kernel_shape, strides, dilations, auto_pad="NOTSET"):
+    if padding is None:
+        node = helper.make_node('Conv',
+                                inputs=['x', 'W'],
+                                outputs=['y'],
+                                kernel_shape=kernel_shape,
+                                # Default values for other attributes:
+                                strides=strides,
+                                dilations=dilations,
+                                # groups=1
+                                auto_pad=auto_pad)
+    else:                                
+        node = helper.make_node('Conv',
+                                inputs=['x', 'W'],
+                                outputs=['y'],
+                                kernel_shape=kernel_shape,
+                                # Default values for other attributes:
+                                strides=strides,
+                                dilations=dilations,
+                                # groups=1
+                                pads=padding)
 
     graph = helper.make_graph([node],
                               'conv_test',
                               inputs=[helper.make_tensor_value_info("x", TensorProto.FLOAT, list(x_shape)),
                                       helper.make_tensor_value_info("W", TensorProto.FLOAT, list(w_shape))],
-                              outputs=[helper.make_tensor_value_info("y", TensorProto.FLOAT, list(y_shape))])
+                              outputs=[helper.make_tensor_value_info("y", TensorProto.FLOAT, list(y_shape))]
+                              )
 
 Review comment:
   Why ')' need a seperate line?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] jwfromm commented on issue #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

jwfromm commented on issue #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#issuecomment-572411450
 
 
   @masahi, maybe you could take a look too.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] masahi commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

masahi commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#discussion_r364752378
 
 

 ##########
 File path: topi/python/topi/generic/nn.py
 ##########
 @@ -34,6 +34,42 @@ def _default_schedule(outs, auto_inline):
     return s
 
 
+@tvm.target.generic_func
+def schedule_conv1d_ncw(outs):
+    """Schedule for conv1d_ncw
+
+    Parameters
+    ----------
+    outs: Array of Tensor
+          The computation graph description of conv2d_hwcn
 
 Review comment:
   needs doc fix

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] jwfromm commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

jwfromm commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#discussion_r365379427
 
 

 ##########
 File path: topi/python/topi/cuda/conv1d.py
 ##########
 @@ -0,0 +1,308 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=invalid-name, unused-argument
+"""Compute definition for conv1d with cuda backend"""
+import tvm
+from tvm import autotvm
+
+from .. import nn, generic
+from ..util import traverse_inline, get_const_tuple
+
+
+@autotvm.register_topi_compute(nn.conv1d, ['cuda', 'gpu'], ['direct'])
+def conv1d_cuda(cfg,
+                data,
+                kernel,
+                strides,
+                padding,
+                dilation,
+                layout='NCW',
+                out_dtype='float32'):
+    """ 1D convolution forward operator for cuda backend.
+
+    Parameters
+    ----------
+    cfg : ConfigEntity
+        The config for this template
+
+    data : tvm.Tensor
+        3-D input shape [batch, in_channel, in_width] for layout == 'NCW'
+        and [batch, in_width, in_channel] for layout == 'NWC'
+
+    kernel : tvm.Tensor
+        3-D kernel with shape [num_filter, in_channel, filter_size] for layout == 'NCW'
+        and [filter_size, in_channel, num_filter] for layout == 'NWC'
+
+    strides : int or tuple
+        The spatial stride along width
+
+    padding : int or str
+        Padding size, or ['VALID', 'SAME']
+
+    dilation : int or tuple
+        Dilation rate if convolution should be dilated.
+
+    layout : str
+        How input data is laid out, must be one of ['NCW', 'NWC']
+
+    out_dtype : str
+        The output data type. If None then output is same type as input.
+    """
+    if out_dtype is None:
+        out_dtype = data.dtype
+    if isinstance(strides, (tuple, list)):
+        strides = strides[0]
+    if isinstance(dilation, (tuple, list)):
+        dilation = dilation[0]
+
+    if layout == 'NCW':
+        return nn.conv1d_ncw(data, kernel, strides, padding, dilation,
+                             out_dtype)
+    if layout == 'NWC':
+        return nn.conv1d_nwc(data, kernel, strides, padding, dilation,
+                             out_dtype)
+    raise ValueError("This layout is not yet supported: {}".format(layout))
+
+
+@autotvm.register_topi_schedule(generic.schedule_conv1d_ncw, ["cuda", "gpu"],
+                                ["direct"])
+def schedule_conv1d_ncw(cfg, outs):
+    """TOPI schedule callback of conv1d ncw for cuda gpu
+
+    Parameters
+    ----------
+    cfg : ConfigEntity
+        the config for this template.
+
+    outs : Array of Tensor
+        The computation graph description of conv1d
+        in the format of an array of tensors.
+
+    Returns
+    -------
+    s : Schedule
+        The computation schedule for conv1d.
+    """
+    outs = [outs] if isinstance(outs, tvm.tensor.Tensor) else outs
+    s = tvm.create_schedule([x.op for x in outs])
+
+    def _callback(op):
+        if op.tag == 'conv1d_ncw':
+            pad_data = op.input_tensors[0]
+            kernel = op.input_tensors[1]
+            conv = op.output(0)
+
+            ##### space definition begin #####
+            n, f, x = s[conv].op.axis
+            rc = s[conv].op.reduce_axis[0]
+            cfg.define_split("tile_n", cfg.axis(n), num_outputs=4)
+            cfg.define_split("tile_f", cfg.axis(f), num_outputs=4)
+            cfg.define_split("tile_x", cfg.axis(x), num_outputs=4)
+            cfg.define_split("tile_rc", cfg.axis(rc), num_outputs=3)
+            cfg.define_knob("auto_unroll_max_step", [64, 512, 1500])
+
+            target = tvm.target.current_target()
+            if target.target_name in ['nvptx', 'rocm']:
+                cfg.define_knob("unroll_explicit", [1])
+            else:
+                cfg.define_knob("unroll_explicit", [0, 1])
+
+            ##### space definition end #####
+
+            if isinstance(kernel.op,
+                          tvm.tensor.ComputeOp) and 'dilate' in kernel.op.tag:
+                s[kernel].compute_inline()
+
+            if conv.op in s.outputs:
+                output = conv
+                OL = s.cache_write(conv, 'local')
+            else:
+                output = s.outputs[0].output(0)
+                s[conv].set_scope('local')
+                OL = conv
+
+            # create cache stage
+            s[pad_data].set_scope('shared')
+            AA = pad_data
+            WW = s.cache_read(kernel, 'shared', [OL])
+
+            # tile and bind spatial axes
+            n, f, x = s[output].op.axis
+            kernel_scope, n = s[output].split(n, nparts=1)
+            bn, vn, tn, ni = cfg["tile_n"].apply(s, output, n)
+            bf, vf, tf, fi = cfg["tile_f"].apply(s, output, f)
+            bx, vx, tx, xi = cfg["tile_x"].apply(s, output, x)
+
+            s[output].reorder(bn, bf, bx, vn, vf, vx, tn, tf, tx, ni, fi, xi)
+            s[output].bind(bn, tvm.thread_axis("blockIdx.z"))
+            s[output].bind(bf, tvm.thread_axis("blockIdx.y"))
+            s[output].bind(bx, tvm.thread_axis("blockIdx.x"))
+            s[output].bind(vn, tvm.thread_axis("vthread"))
+            s[output].bind(vf, tvm.thread_axis("vthread"))
+            s[output].bind(vx, tvm.thread_axis("vthread"))
+
+            s[output].bind(tx, tvm.thread_axis("threadIdx.x"))
+            s[OL].compute_at(s[output], tx)
+            # number of threads
+            n_tz = cfg["tile_n"].size[2] * cfg["tile_f"].size[2]
+            n_tx = cfg["tile_x"].size[2]
+
+            # tile reduction axes
+            n, f, x = s[OL].op.axis
+            rc, rx = s[OL].op.reduce_axis
+            rco, rcm, rci = cfg['tile_rc'].apply(s, OL, rc)
+            s[OL].reorder(rco, rcm, rx, rci, n, f, x)
+
+            s[AA].compute_at(s[OL], rx)
+            s[WW].compute_at(s[OL], rx)
+
+            # cooperative fetching
+            for load in [AA, WW]:
+                n, f, x = s[load].op.axis
+                fused = s[load].fuse(f, x)
+                tz, fused = s[load].split(fused, nparts=n_tz)
+                tx, fused = s[load].split(fused, nparts=n_tx)
+                s[load].bind(tz, tvm.thread_axis("threadIdx.y"))
+                s[load].bind(tx, tvm.thread_axis("threadIdx.x"))
+
+            s[output].pragma(kernel_scope, 'auto_unroll_max_step',
+                             cfg['auto_unroll_max_step'].val)
+            s[output].pragma(kernel_scope, 'unroll_explicit',
+                             cfg['unroll_explicit'].val)
+
+            N, CO, OW = get_const_tuple(output.shape)
+            _, CI, KW = get_const_tuple(kernel.shape)
+            cfg.add_flop(2 * N * OW * CO * KW * CI)
+
+    traverse_inline(s, outs[0].op, _callback)
+
+    return s
+
+
+@autotvm.register_topi_schedule(generic.schedule_conv1d_nwc, ["cuda", "gpu"],
+                                ["direct"])
+def schedule_conv1d_nwc(cfg, outs):
+    """TOPI schedule callback of conv1d nwc for cuda gpu
+
+    Parameters
+    ----------
+    cfg : ConfigEntity
+        the config for this template.
+
+    outs : Array of Tensor
+        The computation graph description of conv1d
+        in the format of an array of tensors.
+
+    Returns
+    -------
+    s : Schedule
+        The computation schedule for conv1d.
+    """
+    outs = [outs] if isinstance(outs, tvm.tensor.Tensor) else outs
+    s = tvm.create_schedule([x.op for x in outs])
+
+    def _callback(op):
+        if op.tag == 'conv1d_nwc':
+            pad_data = op.input_tensors[0]
+            kernel = op.input_tensors[1]
+            conv = op.output(0)
+
+            ##### space definition begin #####
+            n, x, f = s[conv].op.axis
+            rc = s[conv].op.reduce_axis[0]
+            cfg.define_split("tile_n", cfg.axis(n), num_outputs=4)
+            cfg.define_split("tile_x", cfg.axis(x), num_outputs=4)
+            cfg.define_split("tile_f", cfg.axis(f), num_outputs=4)
+            cfg.define_split("tile_rc", cfg.axis(rc), num_outputs=3)
+            cfg.define_knob("auto_unroll_max_step", [64, 512, 1500])
+
+            target = tvm.target.current_target()
+            if target.target_name in ['nvptx', 'rocm']:
+                cfg.define_knob("unroll_explicit", [1])
+            else:
+                cfg.define_knob("unroll_explicit", [0, 1])
+
+            ##### space definition end #####
+
+            if isinstance(kernel.op,
+                          tvm.tensor.ComputeOp) and 'dilate' in kernel.op.tag:
+                s[kernel].compute_inline()
+
+            if conv.op in s.outputs:
+                output = conv
+                OL = s.cache_write(conv, 'local')
+            else:
+                output = s.outputs[0].output(0)
+                s[conv].set_scope('local')
+                OL = conv
+
+            # create cache stage
+            s[pad_data].set_scope('shared')
+            AA = pad_data
+            WW = s.cache_read(kernel, 'shared', [OL])
+
+            # tile and bind spatial axes
+            n, f, x = s[output].op.axis
+            kernel_scope, n = s[output].split(n, nparts=1)
+            bn, vn, tn, ni = cfg["tile_n"].apply(s, output, n)
+            bx, vx, tx, xi = cfg["tile_x"].apply(s, output, x)
+            bf, vf, tf, fi = cfg["tile_f"].apply(s, output, f)
+
+            s[output].reorder(bn, bx, bf, vn, vx, vf, tn, tx, tf, ni, xi, fi)
+            s[output].bind(bn, tvm.thread_axis("blockIdx.z"))
+            s[output].bind(bx, tvm.thread_axis("blockIdx.y"))
+            s[output].bind(bf, tvm.thread_axis("blockIdx.x"))
+            s[output].bind(vn, tvm.thread_axis("vthread"))
+            s[output].bind(vx, tvm.thread_axis("vthread"))
+            s[output].bind(vf, tvm.thread_axis("vthread"))
+
+            s[output].bind(tf, tvm.thread_axis("threadIdx.x"))
+            s[OL].compute_at(s[output], tf)
+            # number of threads
+            n_tz = cfg["tile_n"].size[2] * cfg["tile_x"].size[2]
+            n_tx = cfg["tile_f"].size[2]
+
+            # tile reduction axes
+            n, x, f = s[OL].op.axis
+            rc, rx = s[OL].op.reduce_axis
+            rco, rcm, rci = cfg['tile_rc'].apply(s, OL, rc)
+            s[OL].reorder(rco, rcm, rx, rci, n, x, f)
+
+            s[AA].compute_at(s[OL], rx)
+            s[WW].compute_at(s[OL], rx)
+
+            # cooperative fetching
+            for load in [AA, WW]:
+                n, x, f = s[load].op.axis
+                fused = s[load].fuse(x, f)
+                tz, fused = s[load].split(fused, nparts=n_tz)
+                tx, fused = s[load].split(fused, nparts=n_tx)
+                s[load].bind(tz, tvm.thread_axis("threadIdx.y"))
+                s[load].bind(tx, tvm.thread_axis("threadIdx.x"))
+
+            s[output].pragma(kernel_scope, 'auto_unroll_max_step',
+                             cfg['auto_unroll_max_step'].val)
+            s[output].pragma(kernel_scope, 'unroll_explicit',
+                             cfg['unroll_explicit'].val)
+
+            N, OW, CO = get_const_tuple(output.shape)
+            KW, CI, _ = get_const_tuple(kernel.shape)
+            cfg.add_flop(2 * N * OW * CO * KW * CI)
+
+    traverse_inline(s, outs[0].op, _callback)
+
+    return s
 
 Review comment:
   Unfortunately all the templates frequently use shapes that are different. If we combined the schedules it would require a bunch of ugly if else statements and probably not end up saving much space. Also keep in mind these cuda schedules are pretty basic and may not be optimal for each layout. Keeping them separate makes improving one or the other easier in the future.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] masahi commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

masahi commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#discussion_r365523407
 
 

 ##########
 File path: topi/python/topi/x86/conv1d.py
 ##########
 @@ -0,0 +1,131 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=invalid-name,unused-variable,unused-argument,invalid-name
+"""Conv1D schedule on for Intel CPU"""
+from __future__ import absolute_import as _abs
+import tvm
+from .. import generic, tag
+
+
+@generic.schedule_conv1d_ncw.register(["cpu"])
+def schedule_conv1d_ncw(outs):
+    """Create schedule for tensors"""
+    s = tvm.create_schedule([x.op for x in outs])
+    output_op = outs[0].op
+    scheduled_ops = []
+
+    def traverse(op):
+        """Traverse operators from computation graph"""
+        # inline all one-to-one-mapping operators except the last stage (output)
+        if tag.is_broadcast(op.tag):
+            if op not in s.outputs:
+                s[op].compute_inline()
+            else: # inject custom schedule
+                if len(op.axis) == 3: # schedule bias + bn + relu
+                    n, c, w = op.axis
+                    fused = s[op].fuse(n, c)
+                    s[op].parallel(fused)
+                    s[op].vectorize(w)
+            for tensor in op.input_tensors:
+                if isinstance(tensor.op, tvm.tensor.ComputeOp) and tensor.op not in scheduled_ops:
+                    traverse(tensor.op)
+
+        if 'conv1d_ncw' in op.tag:
+            conv = op.output(0)
+            kernel = op.input_tensors[1]
+            if isinstance(kernel.op, tvm.tensor.ComputeOp) and "dilate" in kernel.op.tag:
+                s[kernel].compute_inline()
+
+            data = op.input_tensors[0]
+            data_pad = None
+            if isinstance(data.op, tvm.tensor.ComputeOp) and "pad" in data.op.tag:
+                data_pad = data
+                data = data_pad.op.input_tensors[0]
+
+            n_pad, c_pad, w_pad = data_pad.op.axis
+            pad_fused = s[data_pad].fuse(n_pad, c_pad)
+            s[data_pad].parallel(pad_fused)
+            C = conv
+            n, c, w = C.op.axis
+            rc, rw = C.op.reduce_axis
+            n_out, c_out, w_out = output_op.axis
+            s[C].vectorize(w)
 
 Review comment:
   This doesnt vectorize if the width is not multiple of the vector length.
   But it is good as a first step.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] jwfromm commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

jwfromm commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#discussion_r365377388
 
 

 ##########
 File path: tests/python/frontend/onnx/test_forward.py
 ##########
 @@ -1732,22 +1732,34 @@ def test_or():
     verify_or(indata=[x, y], dtype=bool)
 
 
-def verify_conv(x_shape, w_shape, y_shape, p):
-    node = helper.make_node('Conv',
-                            inputs=['x', 'W'],
-                            outputs=['y'],
-                            kernel_shape=[3, 3],
-                            # Default values for other attributes:
-                            # strides=[1, 1],
-                            # dilations=[1, 1],
-                            # groups=1
-                            pads=p,)
+def verify_conv(x_shape, w_shape, y_shape, padding, kernel_shape, strides, dilations, auto_pad="NOTSET"):
+    if padding is None:
+        node = helper.make_node('Conv',
+                                inputs=['x', 'W'],
+                                outputs=['y'],
+                                kernel_shape=kernel_shape,
+                                # Default values for other attributes:
+                                strides=strides,
+                                dilations=dilations,
+                                # groups=1
+                                auto_pad=auto_pad)
+    else:                                
+        node = helper.make_node('Conv',
+                                inputs=['x', 'W'],
+                                outputs=['y'],
+                                kernel_shape=kernel_shape,
+                                # Default values for other attributes:
+                                strides=strides,
+                                dilations=dilations,
+                                # groups=1
+                                pads=padding)
 
 Review comment:
   Onnx is very picky about these arguments. If the auto_pad attribute is used, the pads attribute must not be provided at all.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] optima2005 commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

optima2005 commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#discussion_r365497175
 
 

 ##########
 File path: topi/tests/python/test_topi_conv1d.py
 ##########
 @@ -0,0 +1,103 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Test code for transposed convolution."""
+import numpy as np
+import itertools
+import tvm
+import topi
+import topi.testing
+from tvm.contrib.pickle_memoize import memoize
+from topi.util import get_const_tuple
+from common import get_all_backend
+
+
+def verify_conv1d(batch,
+                  in_channels,
+                  in_width,
+                  filters,
+                  kernel_size=3,
+                  stride=1,
+                  dilation=1,
+                  padding='VALID',
+                  layout='NCW'):
+    if layout == 'NCW':
+        in_shape = [batch, in_channels, in_width]
+        kernel_shape = [filters, in_channels, kernel_size]
+    else:
+        in_shape = [batch, in_width, in_channels]
+        kernel_shape = [kernel_size, in_channels, filters]
+
+    dtype = 'float32'
+    A = tvm.placeholder(in_shape, name='A', dtype=dtype)
+    W = tvm.placeholder(kernel_shape, name='W', dtype=dtype)
+
+    def get_ref_data(layout):
+        a_np = np.random.uniform(size=in_shape).astype(dtype)
+        w_np = np.random.uniform(size=kernel_shape).astype(dtype)
+        if layout == 'NWC':
+            np_in = np.transpose(a_np, [0, 2, 1])
+            np_w = np.transpose(w_np, [2, 1, 0])
+        else:
+            np_in = a_np
+            np_w = w_np
+        b_np = topi.testing.conv1d_ncw_python(np_in, np_w, stride, padding, dilation)
+        if layout == 'NWC':
+            b_np = np.transpose(b_np, [0, 2, 1])
+        return a_np, w_np, b_np
+
+    a_np, w_np, b_np = get_ref_data(layout)
+
+    def check_device(device):
+        ctx = tvm.context(device, 0)
+        if not ctx.exist:
+            print("Skip because %s is not enabled" % device)
+            return
+        with tvm.target.create(device):
+            B = topi.nn.conv1d(A, W, stride, padding, dilation, layout, 'float32')
+            if layout == 'NCW':
+                s = topi.generic.schedule_conv1d_ncw([B])
+            else:
+                s = topi.generic.schedule_conv1d_nwc([B])
+
+        a = tvm.nd.array(a_np, ctx)
+        w = tvm.nd.array(w_np, ctx)
+        b = tvm.nd.array(np.zeros(get_const_tuple(B.shape), dtype=dtype), ctx)
+
+        func = tvm.build(s, [A, W, B], device)
+        func(a, w, b)
+        tvm.testing.assert_allclose(b.asnumpy(), b_np, rtol=1e-5)
+
+    for device in get_all_backend():
+        check_device(device)
+
+
+def test_conv1d():
+    for layout in ["NCW", "NWC"]:
+        # Most basic test case
+        verify_conv1d(1, 1, 8, 1, 3, 1, 1, 'VALID', layout)
+        # With padding
+        verify_conv1d(1, 1, 8, 1, 3, 1, 1, 'SAME', layout)
+        # Realistic dimensions
+        verify_conv1d(1, 16, 32, 16, 3, 1, 1, 'SAME', layout)
+        # With stride
+        verify_conv1d(1, 16, 32, 16, 3, 2, 1, 'SAME', layout)
+        # With dilation
+        verify_conv1d(1, 16, 32, 16, 3, 1, 2, 'SAME', layout)
 
 Review comment:
   Could you please add several more tests to cover multiple batch, and other kernel size, like 1, 2

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] masahi commented on issue #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

masahi commented on issue #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#issuecomment-573358545
 
 
   Thanks @jwfromm @optima2005 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] jwfromm commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

jwfromm commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#discussion_r365539993
 
 

 ##########
 File path: topi/python/topi/x86/conv1d.py
 ##########
 @@ -0,0 +1,131 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=invalid-name,unused-variable,unused-argument,invalid-name
+"""Conv1D schedule on for Intel CPU"""
+from __future__ import absolute_import as _abs
+import tvm
+from .. import generic, tag
+
+
+@generic.schedule_conv1d_ncw.register(["cpu"])
+def schedule_conv1d_ncw(outs):
+    """Create schedule for tensors"""
+    s = tvm.create_schedule([x.op for x in outs])
+    output_op = outs[0].op
+    scheduled_ops = []
+
+    def traverse(op):
+        """Traverse operators from computation graph"""
+        # inline all one-to-one-mapping operators except the last stage (output)
+        if tag.is_broadcast(op.tag):
+            if op not in s.outputs:
+                s[op].compute_inline()
+            else: # inject custom schedule
+                if len(op.axis) == 3: # schedule bias + bn + relu
+                    n, c, w = op.axis
+                    fused = s[op].fuse(n, c)
+                    s[op].parallel(fused)
+                    s[op].vectorize(w)
+            for tensor in op.input_tensors:
+                if isinstance(tensor.op, tvm.tensor.ComputeOp) and tensor.op not in scheduled_ops:
+                    traverse(tensor.op)
+
+        if 'conv1d_ncw' in op.tag:
+            conv = op.output(0)
+            kernel = op.input_tensors[1]
+            if isinstance(kernel.op, tvm.tensor.ComputeOp) and "dilate" in kernel.op.tag:
+                s[kernel].compute_inline()
+
+            data = op.input_tensors[0]
+            data_pad = None
+            if isinstance(data.op, tvm.tensor.ComputeOp) and "pad" in data.op.tag:
+                data_pad = data
+                data = data_pad.op.input_tensors[0]
+
+            n_pad, c_pad, w_pad = data_pad.op.axis
+            pad_fused = s[data_pad].fuse(n_pad, c_pad)
+            s[data_pad].parallel(pad_fused)
+            C = conv
+            n, c, w = C.op.axis
+            rc, rw = C.op.reduce_axis
+            n_out, c_out, w_out = output_op.axis
+            s[C].vectorize(w)
 
 Review comment:
   Good point but I suspect these schedules will eventually be totally replaced with templatized version as in conv2d. As a first step these are much better than nothing even when unvectorized.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] jwfromm commented on issue #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

jwfromm commented on issue #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#issuecomment-571796687
 
 
   @optima2005, @kevinthesun, Can one of you take a look at this PR?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] jwfromm commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

jwfromm commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#discussion_r365400991
 
 

 ##########
 File path: python/tvm/relay/frontend/onnx.py
 ##########
 @@ -223,37 +223,64 @@ class Conv(OnnxOpConverter):
 
     @classmethod
     def _impl_v1(cls, inputs, attr, params):
-        # infer pads for auto_pad
+        # Use shape of input to determine convolution type.
+        input_shape = infer_shape(inputs[0])
+
         if 'auto_pad' in attr:
             attr['auto_pad'] = attr['auto_pad'].decode('utf-8')
             if attr['auto_pad'] in ('SAME_UPPER', 'SAME_LOWER'):
-                input_shape = infer_shape(inputs[0])
-                in_h, in_w = input_shape[2], input_shape[3]
-                stride_h, stride_w = attr['strides']
-                kernel_h, kernel_w = attr['kernel_shape']
-                dilation_h, dilation_w = attr['dilations']
-                dilated_kernel_h = (kernel_h - 1) * dilation_h + 1
-                dilated_kernel_w = (kernel_w - 1) * dilation_w + 1
-                pad_v = get_pad_pair(in_h, dilated_kernel_h, stride_h)
-                pad_h = get_pad_pair(in_w, dilated_kernel_w, stride_w)
-                attr['pads'] = (pad_v[0], pad_h[0], pad_v[1], pad_h[1])
+                pad_tuple = []
+                for axis in range(len(input_shape) - 2):
+                    axis_shape = input_shape[2 + axis]
+                    stride = attr['strides'][axis]
+                    kernel = attr['kernel_shape'][axis]
+                    dilation = attr['dilations'][axis]
+                    dilated_kernel = (kernel - 1) * dilation + 1
+                    pad = get_pad_pair(axis_shape, dilated_kernel, stride)
+                    pad_tuple.append(pad)
+                pad_tuple = tuple([val for pair in zip(*pad_tuple) for val in pair])
+                attr['pads'] = pad_tuple
             elif attr['auto_pad'] == 'VALID':
-                attr['pads'] = (0, 0)
+                attr['pads'] = tuple([0 for i in range(len(input_shape) - 2)])
             elif attr['auto_pad'] == 'NOTSET':
                 pass
             else:
                 msg = 'Value {} in attribute "auto_pad" of operator Conv is invalid.'
-                raise tvm.error.OpAttributeInvalid(msg.format(attr['auto_pad']))
+                raise tvm.error.OpAttributeInvalid(
+                    msg.format(attr['auto_pad']))
             attr.pop('auto_pad')
 
-        out = AttrCvt(
-            op_name=dimension_picker('conv'),
-            transforms={
+        # Handle attribute conversion for different convolution types
+
+        # Conv1D
+        if len(input_shape) == 3:
 
 Review comment:
   I've updated the conv converter to be much cleaner, you're right that better use of dimension_picker and constraint simplified it dramatically.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] masahi merged pull request #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

masahi merged pull request #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] optima2005 commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

optima2005 commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#discussion_r365054153
 
 

 ##########
 File path: tests/python/frontend/onnx/test_forward.py
 ##########
 @@ -1732,22 +1732,34 @@ def test_or():
     verify_or(indata=[x, y], dtype=bool)
 
 
-def verify_conv(x_shape, w_shape, y_shape, p):
-    node = helper.make_node('Conv',
-                            inputs=['x', 'W'],
-                            outputs=['y'],
-                            kernel_shape=[3, 3],
-                            # Default values for other attributes:
-                            # strides=[1, 1],
-                            # dilations=[1, 1],
-                            # groups=1
-                            pads=p,)
+def verify_conv(x_shape, w_shape, y_shape, padding, kernel_shape, strides, dilations, auto_pad="NOTSET"):
+    if padding is None:
+        node = helper.make_node('Conv',
+                                inputs=['x', 'W'],
+                                outputs=['y'],
+                                kernel_shape=kernel_shape,
+                                # Default values for other attributes:
+                                strides=strides,
+                                dilations=dilations,
+                                # groups=1
+                                auto_pad=auto_pad)
+    else:                                
+        node = helper.make_node('Conv',
+                                inputs=['x', 'W'],
+                                outputs=['y'],
+                                kernel_shape=kernel_shape,
+                                # Default values for other attributes:
+                                strides=strides,
+                                dilations=dilations,
+                                # groups=1
+                                pads=padding)
 
 Review comment:
   can use conditional only for 'pads'? 
   pads = padding if padding else auto_pad

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] optima2005 commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

optima2005 commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#discussion_r365055937
 
 

 ##########
 File path: topi/python/topi/cuda/conv1d.py
 ##########
 @@ -0,0 +1,308 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=invalid-name, unused-argument
+"""Compute definition for conv1d with cuda backend"""
+import tvm
+from tvm import autotvm
+
+from .. import nn, generic
+from ..util import traverse_inline, get_const_tuple
+
+
+@autotvm.register_topi_compute(nn.conv1d, ['cuda', 'gpu'], ['direct'])
+def conv1d_cuda(cfg,
+                data,
+                kernel,
+                strides,
+                padding,
+                dilation,
+                layout='NCW',
+                out_dtype='float32'):
+    """ 1D convolution forward operator for cuda backend.
+
+    Parameters
+    ----------
+    cfg : ConfigEntity
+        The config for this template
+
+    data : tvm.Tensor
+        3-D input shape [batch, in_channel, in_width] for layout == 'NCW'
+        and [batch, in_width, in_channel] for layout == 'NWC'
+
+    kernel : tvm.Tensor
+        3-D kernel with shape [num_filter, in_channel, filter_size] for layout == 'NCW'
+        and [filter_size, in_channel, num_filter] for layout == 'NWC'
+
+    strides : int or tuple
+        The spatial stride along width
+
+    padding : int or str
+        Padding size, or ['VALID', 'SAME']
+
+    dilation : int or tuple
+        Dilation rate if convolution should be dilated.
+
+    layout : str
+        How input data is laid out, must be one of ['NCW', 'NWC']
+
+    out_dtype : str
+        The output data type. If None then output is same type as input.
+    """
+    if out_dtype is None:
+        out_dtype = data.dtype
+    if isinstance(strides, (tuple, list)):
+        strides = strides[0]
+    if isinstance(dilation, (tuple, list)):
+        dilation = dilation[0]
+
+    if layout == 'NCW':
+        return nn.conv1d_ncw(data, kernel, strides, padding, dilation,
+                             out_dtype)
+    if layout == 'NWC':
+        return nn.conv1d_nwc(data, kernel, strides, padding, dilation,
+                             out_dtype)
+    raise ValueError("This layout is not yet supported: {}".format(layout))
+
+
+@autotvm.register_topi_schedule(generic.schedule_conv1d_ncw, ["cuda", "gpu"],
+                                ["direct"])
+def schedule_conv1d_ncw(cfg, outs):
+    """TOPI schedule callback of conv1d ncw for cuda gpu
+
+    Parameters
+    ----------
+    cfg : ConfigEntity
+        the config for this template.
+
+    outs : Array of Tensor
+        The computation graph description of conv1d
+        in the format of an array of tensors.
+
+    Returns
+    -------
+    s : Schedule
+        The computation schedule for conv1d.
+    """
+    outs = [outs] if isinstance(outs, tvm.tensor.Tensor) else outs
+    s = tvm.create_schedule([x.op for x in outs])
+
+    def _callback(op):
+        if op.tag == 'conv1d_ncw':
+            pad_data = op.input_tensors[0]
+            kernel = op.input_tensors[1]
+            conv = op.output(0)
+
+            ##### space definition begin #####
+            n, f, x = s[conv].op.axis
+            rc = s[conv].op.reduce_axis[0]
+            cfg.define_split("tile_n", cfg.axis(n), num_outputs=4)
+            cfg.define_split("tile_f", cfg.axis(f), num_outputs=4)
+            cfg.define_split("tile_x", cfg.axis(x), num_outputs=4)
+            cfg.define_split("tile_rc", cfg.axis(rc), num_outputs=3)
+            cfg.define_knob("auto_unroll_max_step", [64, 512, 1500])
+
+            target = tvm.target.current_target()
+            if target.target_name in ['nvptx', 'rocm']:
+                cfg.define_knob("unroll_explicit", [1])
+            else:
+                cfg.define_knob("unroll_explicit", [0, 1])
+
+            ##### space definition end #####
+
+            if isinstance(kernel.op,
+                          tvm.tensor.ComputeOp) and 'dilate' in kernel.op.tag:
+                s[kernel].compute_inline()
+
+            if conv.op in s.outputs:
+                output = conv
+                OL = s.cache_write(conv, 'local')
+            else:
+                output = s.outputs[0].output(0)
+                s[conv].set_scope('local')
+                OL = conv
+
+            # create cache stage
+            s[pad_data].set_scope('shared')
+            AA = pad_data
+            WW = s.cache_read(kernel, 'shared', [OL])
+
+            # tile and bind spatial axes
+            n, f, x = s[output].op.axis
+            kernel_scope, n = s[output].split(n, nparts=1)
+            bn, vn, tn, ni = cfg["tile_n"].apply(s, output, n)
+            bf, vf, tf, fi = cfg["tile_f"].apply(s, output, f)
+            bx, vx, tx, xi = cfg["tile_x"].apply(s, output, x)
+
+            s[output].reorder(bn, bf, bx, vn, vf, vx, tn, tf, tx, ni, fi, xi)
+            s[output].bind(bn, tvm.thread_axis("blockIdx.z"))
+            s[output].bind(bf, tvm.thread_axis("blockIdx.y"))
+            s[output].bind(bx, tvm.thread_axis("blockIdx.x"))
+            s[output].bind(vn, tvm.thread_axis("vthread"))
+            s[output].bind(vf, tvm.thread_axis("vthread"))
+            s[output].bind(vx, tvm.thread_axis("vthread"))
+
+            s[output].bind(tx, tvm.thread_axis("threadIdx.x"))
+            s[OL].compute_at(s[output], tx)
+            # number of threads
+            n_tz = cfg["tile_n"].size[2] * cfg["tile_f"].size[2]
+            n_tx = cfg["tile_x"].size[2]
+
+            # tile reduction axes
+            n, f, x = s[OL].op.axis
+            rc, rx = s[OL].op.reduce_axis
+            rco, rcm, rci = cfg['tile_rc'].apply(s, OL, rc)
+            s[OL].reorder(rco, rcm, rx, rci, n, f, x)
+
+            s[AA].compute_at(s[OL], rx)
+            s[WW].compute_at(s[OL], rx)
+
+            # cooperative fetching
+            for load in [AA, WW]:
+                n, f, x = s[load].op.axis
+                fused = s[load].fuse(f, x)
+                tz, fused = s[load].split(fused, nparts=n_tz)
+                tx, fused = s[load].split(fused, nparts=n_tx)
+                s[load].bind(tz, tvm.thread_axis("threadIdx.y"))
+                s[load].bind(tx, tvm.thread_axis("threadIdx.x"))
+
+            s[output].pragma(kernel_scope, 'auto_unroll_max_step',
+                             cfg['auto_unroll_max_step'].val)
+            s[output].pragma(kernel_scope, 'unroll_explicit',
+                             cfg['unroll_explicit'].val)
+
+            N, CO, OW = get_const_tuple(output.shape)
+            _, CI, KW = get_const_tuple(kernel.shape)
+            cfg.add_flop(2 * N * OW * CO * KW * CI)
+
+    traverse_inline(s, outs[0].op, _callback)
+
+    return s
+
+
+@autotvm.register_topi_schedule(generic.schedule_conv1d_nwc, ["cuda", "gpu"],
+                                ["direct"])
+def schedule_conv1d_nwc(cfg, outs):
+    """TOPI schedule callback of conv1d nwc for cuda gpu
+
+    Parameters
+    ----------
+    cfg : ConfigEntity
+        the config for this template.
+
+    outs : Array of Tensor
+        The computation graph description of conv1d
+        in the format of an array of tensors.
+
+    Returns
+    -------
+    s : Schedule
+        The computation schedule for conv1d.
+    """
+    outs = [outs] if isinstance(outs, tvm.tensor.Tensor) else outs
+    s = tvm.create_schedule([x.op for x in outs])
+
+    def _callback(op):
+        if op.tag == 'conv1d_nwc':
+            pad_data = op.input_tensors[0]
+            kernel = op.input_tensors[1]
+            conv = op.output(0)
+
+            ##### space definition begin #####
+            n, x, f = s[conv].op.axis
+            rc = s[conv].op.reduce_axis[0]
+            cfg.define_split("tile_n", cfg.axis(n), num_outputs=4)
+            cfg.define_split("tile_x", cfg.axis(x), num_outputs=4)
+            cfg.define_split("tile_f", cfg.axis(f), num_outputs=4)
+            cfg.define_split("tile_rc", cfg.axis(rc), num_outputs=3)
+            cfg.define_knob("auto_unroll_max_step", [64, 512, 1500])
+
+            target = tvm.target.current_target()
+            if target.target_name in ['nvptx', 'rocm']:
+                cfg.define_knob("unroll_explicit", [1])
+            else:
+                cfg.define_knob("unroll_explicit", [0, 1])
+
+            ##### space definition end #####
+
+            if isinstance(kernel.op,
+                          tvm.tensor.ComputeOp) and 'dilate' in kernel.op.tag:
+                s[kernel].compute_inline()
+
+            if conv.op in s.outputs:
+                output = conv
+                OL = s.cache_write(conv, 'local')
+            else:
+                output = s.outputs[0].output(0)
+                s[conv].set_scope('local')
+                OL = conv
+
+            # create cache stage
+            s[pad_data].set_scope('shared')
+            AA = pad_data
+            WW = s.cache_read(kernel, 'shared', [OL])
+
+            # tile and bind spatial axes
+            n, f, x = s[output].op.axis
+            kernel_scope, n = s[output].split(n, nparts=1)
+            bn, vn, tn, ni = cfg["tile_n"].apply(s, output, n)
+            bx, vx, tx, xi = cfg["tile_x"].apply(s, output, x)
+            bf, vf, tf, fi = cfg["tile_f"].apply(s, output, f)
+
+            s[output].reorder(bn, bx, bf, vn, vx, vf, tn, tx, tf, ni, xi, fi)
+            s[output].bind(bn, tvm.thread_axis("blockIdx.z"))
+            s[output].bind(bx, tvm.thread_axis("blockIdx.y"))
+            s[output].bind(bf, tvm.thread_axis("blockIdx.x"))
+            s[output].bind(vn, tvm.thread_axis("vthread"))
+            s[output].bind(vx, tvm.thread_axis("vthread"))
+            s[output].bind(vf, tvm.thread_axis("vthread"))
+
+            s[output].bind(tf, tvm.thread_axis("threadIdx.x"))
+            s[OL].compute_at(s[output], tf)
+            # number of threads
+            n_tz = cfg["tile_n"].size[2] * cfg["tile_x"].size[2]
+            n_tx = cfg["tile_f"].size[2]
+
+            # tile reduction axes
+            n, x, f = s[OL].op.axis
+            rc, rx = s[OL].op.reduce_axis
+            rco, rcm, rci = cfg['tile_rc'].apply(s, OL, rc)
+            s[OL].reorder(rco, rcm, rx, rci, n, x, f)
+
+            s[AA].compute_at(s[OL], rx)
+            s[WW].compute_at(s[OL], rx)
+
+            # cooperative fetching
+            for load in [AA, WW]:
+                n, x, f = s[load].op.axis
+                fused = s[load].fuse(x, f)
+                tz, fused = s[load].split(fused, nparts=n_tz)
+                tx, fused = s[load].split(fused, nparts=n_tx)
+                s[load].bind(tz, tvm.thread_axis("threadIdx.y"))
+                s[load].bind(tx, tvm.thread_axis("threadIdx.x"))
+
+            s[output].pragma(kernel_scope, 'auto_unroll_max_step',
+                             cfg['auto_unroll_max_step'].val)
+            s[output].pragma(kernel_scope, 'unroll_explicit',
+                             cfg['unroll_explicit'].val)
+
+            N, OW, CO = get_const_tuple(output.shape)
+            KW, CI, _ = get_const_tuple(kernel.shape)
+            cfg.add_flop(2 * N * OW * CO * KW * CI)
+
+    traverse_inline(s, outs[0].op, _callback)
+
+    return s
 
 Review comment:
   Can the template for NCW and NWC call a unified function? Most of them are same.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] jwfromm commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

jwfromm commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#discussion_r365377388
 
 

 ##########
 File path: tests/python/frontend/onnx/test_forward.py
 ##########
 @@ -1732,22 +1732,34 @@ def test_or():
     verify_or(indata=[x, y], dtype=bool)
 
 
-def verify_conv(x_shape, w_shape, y_shape, p):
-    node = helper.make_node('Conv',
-                            inputs=['x', 'W'],
-                            outputs=['y'],
-                            kernel_shape=[3, 3],
-                            # Default values for other attributes:
-                            # strides=[1, 1],
-                            # dilations=[1, 1],
-                            # groups=1
-                            pads=p,)
+def verify_conv(x_shape, w_shape, y_shape, padding, kernel_shape, strides, dilations, auto_pad="NOTSET"):
+    if padding is None:
+        node = helper.make_node('Conv',
+                                inputs=['x', 'W'],
+                                outputs=['y'],
+                                kernel_shape=kernel_shape,
+                                # Default values for other attributes:
+                                strides=strides,
+                                dilations=dilations,
+                                # groups=1
+                                auto_pad=auto_pad)
+    else:                                
+        node = helper.make_node('Conv',
+                                inputs=['x', 'W'],
+                                outputs=['y'],
+                                kernel_shape=kernel_shape,
+                                # Default values for other attributes:
+                                strides=strides,
+                                dilations=dilations,
+                                # groups=1
+                                pads=padding)
 
 Review comment:
   Onnx is very picky about these arguments. If the auto_pad attribute is used, the pads attribute must not be provided at all. Note that auto_pad and pads are two separate attributes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] optima2005 commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

optima2005 commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#discussion_r365048590
 
 

 ##########
 File path: python/tvm/relay/frontend/onnx.py
 ##########
 @@ -223,37 +223,64 @@ class Conv(OnnxOpConverter):
 
     @classmethod
     def _impl_v1(cls, inputs, attr, params):
-        # infer pads for auto_pad
+        # Use shape of input to determine convolution type.
+        input_shape = infer_shape(inputs[0])
+
         if 'auto_pad' in attr:
             attr['auto_pad'] = attr['auto_pad'].decode('utf-8')
             if attr['auto_pad'] in ('SAME_UPPER', 'SAME_LOWER'):
-                input_shape = infer_shape(inputs[0])
-                in_h, in_w = input_shape[2], input_shape[3]
-                stride_h, stride_w = attr['strides']
-                kernel_h, kernel_w = attr['kernel_shape']
-                dilation_h, dilation_w = attr['dilations']
-                dilated_kernel_h = (kernel_h - 1) * dilation_h + 1
-                dilated_kernel_w = (kernel_w - 1) * dilation_w + 1
-                pad_v = get_pad_pair(in_h, dilated_kernel_h, stride_h)
-                pad_h = get_pad_pair(in_w, dilated_kernel_w, stride_w)
-                attr['pads'] = (pad_v[0], pad_h[0], pad_v[1], pad_h[1])
+                pad_tuple = []
+                for axis in range(len(input_shape) - 2):
+                    axis_shape = input_shape[2 + axis]
+                    stride = attr['strides'][axis]
+                    kernel = attr['kernel_shape'][axis]
+                    dilation = attr['dilations'][axis]
+                    dilated_kernel = (kernel - 1) * dilation + 1
+                    pad = get_pad_pair(axis_shape, dilated_kernel, stride)
+                    pad_tuple.append(pad)
+                pad_tuple = tuple([val for pair in zip(*pad_tuple) for val in pair])
+                attr['pads'] = pad_tuple
             elif attr['auto_pad'] == 'VALID':
-                attr['pads'] = (0, 0)
+                attr['pads'] = tuple([0 for i in range(len(input_shape) - 2)])
             elif attr['auto_pad'] == 'NOTSET':
                 pass
             else:
                 msg = 'Value {} in attribute "auto_pad" of operator Conv is invalid.'
-                raise tvm.error.OpAttributeInvalid(msg.format(attr['auto_pad']))
+                raise tvm.error.OpAttributeInvalid(
+                    msg.format(attr['auto_pad']))
             attr.pop('auto_pad')
 
-        out = AttrCvt(
-            op_name=dimension_picker('conv'),
-            transforms={
+        # Handle attribute conversion for different convolution types
+
+        # Conv1D
+        if len(input_shape) == 3:
 
 Review comment:
   I suggest to update dimension_picker() and dimension_constraint(). please see tensoflow frontend. And only conditional switch those default values. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] optima2005 commented on issue #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

optima2005 commented on issue #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#issuecomment-573305933
 
 
   LGTM.  Thanks @jwfromm 
   But I am not quite confident to the schedule template part. @masahi 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] jwfromm commented on issue #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

jwfromm commented on issue #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#issuecomment-573218930
 
 
   @optima2005 I think I've addressed all your comments. Can you take another look and let me know what you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] jwfromm commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

jwfromm commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#discussion_r365539927
 
 

 ##########
 File path: topi/tests/python/test_topi_conv1d.py
 ##########
 @@ -0,0 +1,110 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Test code for transposed convolution."""
+import numpy as np
+import itertools
+import tvm
+import topi
+import topi.testing
+from tvm.contrib.pickle_memoize import memoize
+from topi.util import get_const_tuple
+from common import get_all_backend
+
+
+def verify_conv1d(batch,
+                  in_channels,
+                  in_width,
+                  filters,
+                  kernel_size=3,
+                  stride=1,
+                  dilation=1,
+                  padding='VALID',
+                  layout='NCW'):
+    if layout == 'NCW':
+        in_shape = [batch, in_channels, in_width]
+        kernel_shape = [filters, in_channels, kernel_size]
+    else:
+        in_shape = [batch, in_width, in_channels]
+        kernel_shape = [kernel_size, in_channels, filters]
+
+    dtype = 'float32'
+    A = tvm.placeholder(in_shape, name='A', dtype=dtype)
+    W = tvm.placeholder(kernel_shape, name='W', dtype=dtype)
+
+    def get_ref_data(layout):
+        a_np = np.random.uniform(size=in_shape).astype(dtype)
+        w_np = np.random.uniform(size=kernel_shape).astype(dtype)
+        if layout == 'NWC':
+            np_in = np.transpose(a_np, [0, 2, 1])
+            np_w = np.transpose(w_np, [2, 1, 0])
+        else:
+            np_in = a_np
+            np_w = w_np
+        b_np = topi.testing.conv1d_ncw_python(np_in, np_w, stride, padding, dilation)
+        if layout == 'NWC':
+            b_np = np.transpose(b_np, [0, 2, 1])
+        return a_np, w_np, b_np
+
+    a_np, w_np, b_np = get_ref_data(layout)
+
+    def check_device(device):
+        ctx = tvm.context(device, 0)
+        if not ctx.exist:
+            print("Skip because %s is not enabled" % device)
+            return
+        with tvm.target.create(device):
+            B = topi.nn.conv1d(A, W, stride, padding, dilation, layout, 'float32')
+            if layout == 'NCW':
+                s = topi.generic.schedule_conv1d_ncw([B])
+            else:
+                s = topi.generic.schedule_conv1d_nwc([B])
+
+        a = tvm.nd.array(a_np, ctx)
+        w = tvm.nd.array(w_np, ctx)
+        b = tvm.nd.array(np.zeros(get_const_tuple(B.shape), dtype=dtype), ctx)
+
+        func = tvm.build(s, [A, W, B], device)
+        func(a, w, b)
+        tvm.testing.assert_allclose(b.asnumpy(), b_np, rtol=1e-5)
+
+    for device in get_all_backend():
+        check_device(device)
+
+
+def test_conv1d():
+    for layout in ["NCW", "NWC"]:
+        # Most basic test case
+        verify_conv1d(1, 1, 8, 1, 3, 1, 1, 'VALID', layout)
+        # With padding
+        verify_conv1d(1, 1, 8, 1, 3, 1, 1, 'SAME', layout)
+        # Realistic dimensions
+        verify_conv1d(1, 16, 32, 16, 3, 1, 1, 'SAME', layout)
+        # With stride
+        verify_conv1d(1, 16, 32, 16, 3, 2, 1, 'SAME', layout)
+        # With dilation
+        verify_conv1d(1, 16, 32, 16, 3, 1, 2, 'SAME', layout)
+        # Large batch size
+        verify_conv1d(8, 16, 32, 16, 3, 1, 1, 'SAME', layout)
+        # Other kernel sizes
+        verify_conv1d(1, 16, 32, 16, 3, 1, 1, 'SAME', layout)
+        verify_conv1d(1, 16, 32, 16, 2, 1, 1, 'SAME', layout)
+        verify_conv1d(1, 16, 32, 16, 1, 1, 1, 'SAME', layout)
+
 
 Review comment:
   Just added some tests for non power of two shapes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] masahi commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

masahi commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#discussion_r365523678
 
 

 ##########
 File path: topi/tests/python/test_topi_conv1d.py
 ##########
 @@ -0,0 +1,110 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Test code for transposed convolution."""
+import numpy as np
+import itertools
+import tvm
+import topi
+import topi.testing
+from tvm.contrib.pickle_memoize import memoize
+from topi.util import get_const_tuple
+from common import get_all_backend
+
+
+def verify_conv1d(batch,
+                  in_channels,
+                  in_width,
+                  filters,
+                  kernel_size=3,
+                  stride=1,
+                  dilation=1,
+                  padding='VALID',
+                  layout='NCW'):
+    if layout == 'NCW':
+        in_shape = [batch, in_channels, in_width]
+        kernel_shape = [filters, in_channels, kernel_size]
+    else:
+        in_shape = [batch, in_width, in_channels]
+        kernel_shape = [kernel_size, in_channels, filters]
+
+    dtype = 'float32'
+    A = tvm.placeholder(in_shape, name='A', dtype=dtype)
+    W = tvm.placeholder(kernel_shape, name='W', dtype=dtype)
+
+    def get_ref_data(layout):
+        a_np = np.random.uniform(size=in_shape).astype(dtype)
+        w_np = np.random.uniform(size=kernel_shape).astype(dtype)
+        if layout == 'NWC':
+            np_in = np.transpose(a_np, [0, 2, 1])
+            np_w = np.transpose(w_np, [2, 1, 0])
+        else:
+            np_in = a_np
+            np_w = w_np
+        b_np = topi.testing.conv1d_ncw_python(np_in, np_w, stride, padding, dilation)
+        if layout == 'NWC':
+            b_np = np.transpose(b_np, [0, 2, 1])
+        return a_np, w_np, b_np
+
+    a_np, w_np, b_np = get_ref_data(layout)
+
+    def check_device(device):
+        ctx = tvm.context(device, 0)
+        if not ctx.exist:
+            print("Skip because %s is not enabled" % device)
+            return
+        with tvm.target.create(device):
+            B = topi.nn.conv1d(A, W, stride, padding, dilation, layout, 'float32')
+            if layout == 'NCW':
+                s = topi.generic.schedule_conv1d_ncw([B])
+            else:
+                s = topi.generic.schedule_conv1d_nwc([B])
+
+        a = tvm.nd.array(a_np, ctx)
+        w = tvm.nd.array(w_np, ctx)
+        b = tvm.nd.array(np.zeros(get_const_tuple(B.shape), dtype=dtype), ctx)
+
+        func = tvm.build(s, [A, W, B], device)
+        func(a, w, b)
+        tvm.testing.assert_allclose(b.asnumpy(), b_np, rtol=1e-5)
+
+    for device in get_all_backend():
+        check_device(device)
+
+
+def test_conv1d():
+    for layout in ["NCW", "NWC"]:
+        # Most basic test case
+        verify_conv1d(1, 1, 8, 1, 3, 1, 1, 'VALID', layout)
+        # With padding
+        verify_conv1d(1, 1, 8, 1, 3, 1, 1, 'SAME', layout)
+        # Realistic dimensions
+        verify_conv1d(1, 16, 32, 16, 3, 1, 1, 'SAME', layout)
+        # With stride
+        verify_conv1d(1, 16, 32, 16, 3, 2, 1, 'SAME', layout)
+        # With dilation
+        verify_conv1d(1, 16, 32, 16, 3, 1, 2, 'SAME', layout)
+        # Large batch size
+        verify_conv1d(8, 16, 32, 16, 3, 1, 1, 'SAME', layout)
+        # Other kernel sizes
+        verify_conv1d(1, 16, 32, 16, 3, 1, 1, 'SAME', layout)
+        verify_conv1d(1, 16, 32, 16, 2, 1, 1, 'SAME', layout)
+        verify_conv1d(1, 16, 32, 16, 1, 1, 1, 'SAME', layout)
+
 
 Review comment:
   Better to add tests for in width that are not the power of 4 or 8.
   But I dont know if width can be arbitrary in NLP application. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] masahi commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

masahi commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#discussion_r365498346
 
 

 ##########
 File path: topi/tests/python/test_topi_conv1d.py
 ##########
 @@ -0,0 +1,103 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Test code for transposed convolution."""
+import numpy as np
+import itertools
+import tvm
+import topi
+import topi.testing
+from tvm.contrib.pickle_memoize import memoize
+from topi.util import get_const_tuple
+from common import get_all_backend
+
+
+def verify_conv1d(batch,
+                  in_channels,
+                  in_width,
+                  filters,
+                  kernel_size=3,
+                  stride=1,
+                  dilation=1,
+                  padding='VALID',
+                  layout='NCW'):
+    if layout == 'NCW':
+        in_shape = [batch, in_channels, in_width]
+        kernel_shape = [filters, in_channels, kernel_size]
+    else:
+        in_shape = [batch, in_width, in_channels]
+        kernel_shape = [kernel_size, in_channels, filters]
+
+    dtype = 'float32'
+    A = tvm.placeholder(in_shape, name='A', dtype=dtype)
+    W = tvm.placeholder(kernel_shape, name='W', dtype=dtype)
+
+    def get_ref_data(layout):
+        a_np = np.random.uniform(size=in_shape).astype(dtype)
+        w_np = np.random.uniform(size=kernel_shape).astype(dtype)
+        if layout == 'NWC':
+            np_in = np.transpose(a_np, [0, 2, 1])
+            np_w = np.transpose(w_np, [2, 1, 0])
+        else:
+            np_in = a_np
+            np_w = w_np
+        b_np = topi.testing.conv1d_ncw_python(np_in, np_w, stride, padding, dilation)
+        if layout == 'NWC':
+            b_np = np.transpose(b_np, [0, 2, 1])
+        return a_np, w_np, b_np
+
+    a_np, w_np, b_np = get_ref_data(layout)
+
+    def check_device(device):
+        ctx = tvm.context(device, 0)
+        if not ctx.exist:
+            print("Skip because %s is not enabled" % device)
+            return
+        with tvm.target.create(device):
+            B = topi.nn.conv1d(A, W, stride, padding, dilation, layout, 'float32')
+            if layout == 'NCW':
+                s = topi.generic.schedule_conv1d_ncw([B])
+            else:
+                s = topi.generic.schedule_conv1d_nwc([B])
+
+        a = tvm.nd.array(a_np, ctx)
+        w = tvm.nd.array(w_np, ctx)
+        b = tvm.nd.array(np.zeros(get_const_tuple(B.shape), dtype=dtype), ctx)
+
+        func = tvm.build(s, [A, W, B], device)
+        func(a, w, b)
+        tvm.testing.assert_allclose(b.asnumpy(), b_np, rtol=1e-5)
+
+    for device in get_all_backend():
+        check_device(device)
+
+
+def test_conv1d():
+    for layout in ["NCW", "NWC"]:
+        # Most basic test case
+        verify_conv1d(1, 1, 8, 1, 3, 1, 1, 'VALID', layout)
+        # With padding
+        verify_conv1d(1, 1, 8, 1, 3, 1, 1, 'SAME', layout)
+        # Realistic dimensions
+        verify_conv1d(1, 16, 32, 16, 3, 1, 1, 'SAME', layout)
+        # With stride
+        verify_conv1d(1, 16, 32, 16, 3, 2, 1, 'SAME', layout)
+        # With dilation
+        verify_conv1d(1, 16, 32, 16, 3, 1, 2, 'SAME', layout)
 
 Review comment:
   Are multiple batch, kernel sizes like 1, 2 common in the NLP use case? @jwfromm 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] jwfromm commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

jwfromm commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#discussion_r365504204
 
 

 ##########
 File path: topi/tests/python/test_topi_conv1d.py
 ##########
 @@ -0,0 +1,103 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Test code for transposed convolution."""
+import numpy as np
+import itertools
+import tvm
+import topi
+import topi.testing
+from tvm.contrib.pickle_memoize import memoize
+from topi.util import get_const_tuple
+from common import get_all_backend
+
+
+def verify_conv1d(batch,
+                  in_channels,
+                  in_width,
+                  filters,
+                  kernel_size=3,
+                  stride=1,
+                  dilation=1,
+                  padding='VALID',
+                  layout='NCW'):
+    if layout == 'NCW':
+        in_shape = [batch, in_channels, in_width]
+        kernel_shape = [filters, in_channels, kernel_size]
+    else:
+        in_shape = [batch, in_width, in_channels]
+        kernel_shape = [kernel_size, in_channels, filters]
+
+    dtype = 'float32'
+    A = tvm.placeholder(in_shape, name='A', dtype=dtype)
+    W = tvm.placeholder(kernel_shape, name='W', dtype=dtype)
+
+    def get_ref_data(layout):
+        a_np = np.random.uniform(size=in_shape).astype(dtype)
+        w_np = np.random.uniform(size=kernel_shape).astype(dtype)
+        if layout == 'NWC':
+            np_in = np.transpose(a_np, [0, 2, 1])
+            np_w = np.transpose(w_np, [2, 1, 0])
+        else:
+            np_in = a_np
+            np_w = w_np
+        b_np = topi.testing.conv1d_ncw_python(np_in, np_w, stride, padding, dilation)
+        if layout == 'NWC':
+            b_np = np.transpose(b_np, [0, 2, 1])
+        return a_np, w_np, b_np
+
+    a_np, w_np, b_np = get_ref_data(layout)
+
+    def check_device(device):
+        ctx = tvm.context(device, 0)
+        if not ctx.exist:
+            print("Skip because %s is not enabled" % device)
+            return
+        with tvm.target.create(device):
+            B = topi.nn.conv1d(A, W, stride, padding, dilation, layout, 'float32')
+            if layout == 'NCW':
+                s = topi.generic.schedule_conv1d_ncw([B])
+            else:
+                s = topi.generic.schedule_conv1d_nwc([B])
+
+        a = tvm.nd.array(a_np, ctx)
+        w = tvm.nd.array(w_np, ctx)
+        b = tvm.nd.array(np.zeros(get_const_tuple(B.shape), dtype=dtype), ctx)
+
+        func = tvm.build(s, [A, W, B], device)
+        func(a, w, b)
+        tvm.testing.assert_allclose(b.asnumpy(), b_np, rtol=1e-5)
+
+    for device in get_all_backend():
+        check_device(device)
+
+
+def test_conv1d():
+    for layout in ["NCW", "NWC"]:
+        # Most basic test case
+        verify_conv1d(1, 1, 8, 1, 3, 1, 1, 'VALID', layout)
+        # With padding
+        verify_conv1d(1, 1, 8, 1, 3, 1, 1, 'SAME', layout)
+        # Realistic dimensions
+        verify_conv1d(1, 16, 32, 16, 3, 1, 1, 'SAME', layout)
+        # With stride
+        verify_conv1d(1, 16, 32, 16, 3, 2, 1, 'SAME', layout)
+        # With dilation
+        verify_conv1d(1, 16, 32, 16, 3, 1, 2, 'SAME', layout)
 
 Review comment:
   Testing larger batches and other kernel sizes is reasonable and @optima2005 is right that it should be added. My latest commit includes these tests along with some simple x86 schedules that improve performance by about 30X.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] jwfromm commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

jwfromm commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#discussion_r364983601
 
 

 ##########
 File path: src/relay/op/nn/convolution.cc
 ##########
 @@ -52,6 +50,67 @@ Array<Array<Layout> > ConvInferCorrectLayout(
                                    params->data_layout : params->out_layout}};
 }
 
+
+// relay.nn.conv1d
+TVM_REGISTER_NODE_TYPE(Conv1DAttrs);
+
+// Positional relay function to create conv1d operator
+// used by frontend FFI.
+Expr MakeConv1D(Expr data,
 
 Review comment:
   Good catch, you're right that it was very straightforward to combine all the MakeConv calls. My latest commit includes this change.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] jwfromm commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

jwfromm commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#discussion_r365401102
 
 

 ##########
 File path: tests/python/frontend/onnx/test_forward.py
 ##########
 @@ -1761,18 +1773,95 @@ def verify_conv(x_shape, w_shape, y_shape, p):
 
 def test_conv():
     # Convolution with padding
-    # (1, 1, 5, 5) input tensor
-    # (1, 1, 3, 3) tensor for convolution weights
-    # (1, 1, 5, 5) output tensor
-    # [1, 1, 1, 1] list for pads
-    verify_conv((1, 1, 5, 5), (1, 1, 3, 3), (1, 1, 5, 5), [1, 1, 1, 1])
+    # Conv2D
+    verify_conv(x_shape=(1, 1, 5, 5),
+                w_shape=(1, 1, 3, 3),
+                y_shape=(1, 1, 5, 5),
+                padding=[1, 1, 1, 1],
+                kernel_shape=[3, 3],
+                strides=[1, 1],
+                dilations=[1, 1])
+    # Conv1D
+    verify_conv(x_shape=(1, 1, 5),
+                w_shape=(1, 1, 3),
+                y_shape=(1, 1, 5),
+                padding=[1, 1],
+                kernel_shape=[3,],
+                strides=[1,],
+                dilations=[1,])
 
     # Convolution without padding
-    # (1, 1, 5, 5) input tensor
-    # (1, 1, 3, 3) tensor for convolution weights
-    # (1, 1, 3, 3) output tensor
-    # [0, 0, 0, 0] list for pads
-    verify_conv((1, 1, 5, 5), (1, 1, 3, 3), (1, 1, 3, 3), [0, 0, 0, 0])
+    # Conv2D
+    verify_conv(x_shape=(1, 1, 5, 5),
+                w_shape=(1, 1, 3, 3),
+                y_shape=(1, 1, 3, 3),
+                padding=[0, 0, 0, 0],
+                kernel_shape=[3, 3],
+                strides=[1, 1],
+                dilations=[1, 1])
+    # Conv1D
+    verify_conv(x_shape=(1, 1, 5),
+                w_shape=(1, 1, 3),
+                y_shape=(1, 1, 3),
+                padding=[0, 0],
+                kernel_shape=[3,],
+                strides=[1,],
+                dilations=[1,])
+
+    # Convolution with autopadding
+    verify_conv(x_shape=(1, 1, 5, 5),
+                w_shape=(1, 1, 3, 3),
+                y_shape=(1, 1, 5, 5),
+                kernel_shape=[3, 3],
+                strides=[1, 1],
+                dilations=[1, 1],
+                padding=None,
+                auto_pad="SAME_UPPER")
+    # Conv1D
+    verify_conv(x_shape=(1, 1, 5),
+                w_shape=(1, 1, 3),
+                y_shape=(1, 1, 5),
+                kernel_shape=[3,],
+                strides=[1,],
+                dilations=[1,],
+                padding=None,
+                auto_pad="SAME_UPPER")
+
+    # Convolution with non uniform stride
+    verify_conv(x_shape=(1, 1, 5, 5),
+                w_shape=(1, 1, 3, 3),
+                y_shape=(1, 1, 3, 3),
+                kernel_shape=[3, 3],
+                strides=[2, 2],
+                dilations=[1, 1],
+                padding=None,
+                auto_pad="SAME_UPPER")
+    # Conv1D
+    verify_conv(x_shape=(1, 1, 5),
+                w_shape=(1, 1, 3),
+                y_shape=(1, 1, 3),
+                kernel_shape=[3,],
+                strides=[2,],
+                dilations=[1,],
+                padding=None,
+                auto_pad="SAME_UPPER")
+
+    # Convolution with dilation
+    verify_conv(x_shape=(1, 1, 5, 5),
+                w_shape=(1, 1, 3, 3),
+                y_shape=(1, 1, 5, 5),
+                kernel_shape=[3, 3],
+                strides=[1, 1],
+                dilations=[2, 2],
+                padding=[2, 2, 2, 2])
+    # Conv1D
+    verify_conv(x_shape=(1, 1, 5),
+                w_shape=(1, 1, 3),
+                y_shape=(1, 1, 5),
+                kernel_shape=[3,],
+                strides=[1,],
+                dilations=[2,],
+                padding=[2, 2])
 
 Review comment:
   I've removed keywords in the latest commit.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] optima2005 commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

optima2005 commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#discussion_r365054591
 
 

 ##########
 File path: tests/python/frontend/onnx/test_forward.py
 ##########
 @@ -1761,18 +1773,95 @@ def verify_conv(x_shape, w_shape, y_shape, p):
 
 def test_conv():
     # Convolution with padding
-    # (1, 1, 5, 5) input tensor
-    # (1, 1, 3, 3) tensor for convolution weights
-    # (1, 1, 5, 5) output tensor
-    # [1, 1, 1, 1] list for pads
-    verify_conv((1, 1, 5, 5), (1, 1, 3, 3), (1, 1, 5, 5), [1, 1, 1, 1])
+    # Conv2D
+    verify_conv(x_shape=(1, 1, 5, 5),
+                w_shape=(1, 1, 3, 3),
+                y_shape=(1, 1, 5, 5),
+                padding=[1, 1, 1, 1],
+                kernel_shape=[3, 3],
+                strides=[1, 1],
+                dilations=[1, 1])
+    # Conv1D
+    verify_conv(x_shape=(1, 1, 5),
+                w_shape=(1, 1, 3),
+                y_shape=(1, 1, 5),
+                padding=[1, 1],
+                kernel_shape=[3,],
+                strides=[1,],
+                dilations=[1,])
 
     # Convolution without padding
-    # (1, 1, 5, 5) input tensor
-    # (1, 1, 3, 3) tensor for convolution weights
-    # (1, 1, 3, 3) output tensor
-    # [0, 0, 0, 0] list for pads
-    verify_conv((1, 1, 5, 5), (1, 1, 3, 3), (1, 1, 3, 3), [0, 0, 0, 0])
+    # Conv2D
+    verify_conv(x_shape=(1, 1, 5, 5),
+                w_shape=(1, 1, 3, 3),
+                y_shape=(1, 1, 3, 3),
+                padding=[0, 0, 0, 0],
+                kernel_shape=[3, 3],
+                strides=[1, 1],
+                dilations=[1, 1])
+    # Conv1D
+    verify_conv(x_shape=(1, 1, 5),
+                w_shape=(1, 1, 3),
+                y_shape=(1, 1, 3),
+                padding=[0, 0],
+                kernel_shape=[3,],
+                strides=[1,],
+                dilations=[1,])
+
+    # Convolution with autopadding
+    verify_conv(x_shape=(1, 1, 5, 5),
+                w_shape=(1, 1, 3, 3),
+                y_shape=(1, 1, 5, 5),
+                kernel_shape=[3, 3],
+                strides=[1, 1],
+                dilations=[1, 1],
+                padding=None,
+                auto_pad="SAME_UPPER")
+    # Conv1D
+    verify_conv(x_shape=(1, 1, 5),
+                w_shape=(1, 1, 3),
+                y_shape=(1, 1, 5),
+                kernel_shape=[3,],
+                strides=[1,],
+                dilations=[1,],
+                padding=None,
+                auto_pad="SAME_UPPER")
+
+    # Convolution with non uniform stride
+    verify_conv(x_shape=(1, 1, 5, 5),
+                w_shape=(1, 1, 3, 3),
+                y_shape=(1, 1, 3, 3),
+                kernel_shape=[3, 3],
+                strides=[2, 2],
+                dilations=[1, 1],
+                padding=None,
+                auto_pad="SAME_UPPER")
+    # Conv1D
+    verify_conv(x_shape=(1, 1, 5),
+                w_shape=(1, 1, 3),
+                y_shape=(1, 1, 3),
+                kernel_shape=[3,],
+                strides=[2,],
+                dilations=[1,],
+                padding=None,
+                auto_pad="SAME_UPPER")
+
+    # Convolution with dilation
+    verify_conv(x_shape=(1, 1, 5, 5),
+                w_shape=(1, 1, 3, 3),
+                y_shape=(1, 1, 5, 5),
+                kernel_shape=[3, 3],
+                strides=[1, 1],
+                dilations=[2, 2],
+                padding=[2, 2, 2, 2])
+    # Conv1D
+    verify_conv(x_shape=(1, 1, 5),
+                w_shape=(1, 1, 3),
+                y_shape=(1, 1, 5),
+                kernel_shape=[3,],
+                strides=[1,],
+                dilations=[2,],
+                padding=[2, 2])
 
 Review comment:
   The parameter key words are not necessary for those cases.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] masahi commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

masahi commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#discussion_r364752946
 
 

 ##########
 File path: src/relay/op/nn/convolution.cc
 ##########
 @@ -52,6 +50,67 @@ Array<Array<Layout> > ConvInferCorrectLayout(
                                    params->data_layout : params->out_layout}};
 }
 
+
+// relay.nn.conv1d
+TVM_REGISTER_NODE_TYPE(Conv1DAttrs);
+
+// Positional relay function to create conv1d operator
+// used by frontend FFI.
+Expr MakeConv1D(Expr data,
 
 Review comment:
   Can you clean up MakeConv1D, MakeConv2D, and MakeConv3D using template like we do in pooling.cc?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] jwfromm commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D

Posted by GitBox <gi...@apache.org>.

jwfromm commented on a change in pull request #4639: [Relay/Topi][Op] Conv1D
URL: https://github.com/apache/incubator-tvm/pull/4639#discussion_r365379427
 
 

 ##########
 File path: topi/python/topi/cuda/conv1d.py
 ##########
 @@ -0,0 +1,308 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=invalid-name, unused-argument
+"""Compute definition for conv1d with cuda backend"""
+import tvm
+from tvm import autotvm
+
+from .. import nn, generic
+from ..util import traverse_inline, get_const_tuple
+
+
+@autotvm.register_topi_compute(nn.conv1d, ['cuda', 'gpu'], ['direct'])
+def conv1d_cuda(cfg,
+                data,
+                kernel,
+                strides,
+                padding,
+                dilation,
+                layout='NCW',
+                out_dtype='float32'):
+    """ 1D convolution forward operator for cuda backend.
+
+    Parameters
+    ----------
+    cfg : ConfigEntity
+        The config for this template
+
+    data : tvm.Tensor
+        3-D input shape [batch, in_channel, in_width] for layout == 'NCW'
+        and [batch, in_width, in_channel] for layout == 'NWC'
+
+    kernel : tvm.Tensor
+        3-D kernel with shape [num_filter, in_channel, filter_size] for layout == 'NCW'
+        and [filter_size, in_channel, num_filter] for layout == 'NWC'
+
+    strides : int or tuple
+        The spatial stride along width
+
+    padding : int or str
+        Padding size, or ['VALID', 'SAME']
+
+    dilation : int or tuple
+        Dilation rate if convolution should be dilated.
+
+    layout : str
+        How input data is laid out, must be one of ['NCW', 'NWC']
+
+    out_dtype : str
+        The output data type. If None then output is same type as input.
+    """
+    if out_dtype is None:
+        out_dtype = data.dtype
+    if isinstance(strides, (tuple, list)):
+        strides = strides[0]
+    if isinstance(dilation, (tuple, list)):
+        dilation = dilation[0]
+
+    if layout == 'NCW':
+        return nn.conv1d_ncw(data, kernel, strides, padding, dilation,
+                             out_dtype)
+    if layout == 'NWC':
+        return nn.conv1d_nwc(data, kernel, strides, padding, dilation,
+                             out_dtype)
+    raise ValueError("This layout is not yet supported: {}".format(layout))
+
+
+@autotvm.register_topi_schedule(generic.schedule_conv1d_ncw, ["cuda", "gpu"],
+                                ["direct"])
+def schedule_conv1d_ncw(cfg, outs):
+    """TOPI schedule callback of conv1d ncw for cuda gpu
+
+    Parameters
+    ----------
+    cfg : ConfigEntity
+        the config for this template.
+
+    outs : Array of Tensor
+        The computation graph description of conv1d
+        in the format of an array of tensors.
+
+    Returns
+    -------
+    s : Schedule
+        The computation schedule for conv1d.
+    """
+    outs = [outs] if isinstance(outs, tvm.tensor.Tensor) else outs
+    s = tvm.create_schedule([x.op for x in outs])
+
+    def _callback(op):
+        if op.tag == 'conv1d_ncw':
+            pad_data = op.input_tensors[0]
+            kernel = op.input_tensors[1]
+            conv = op.output(0)
+
+            ##### space definition begin #####
+            n, f, x = s[conv].op.axis
+            rc = s[conv].op.reduce_axis[0]
+            cfg.define_split("tile_n", cfg.axis(n), num_outputs=4)
+            cfg.define_split("tile_f", cfg.axis(f), num_outputs=4)
+            cfg.define_split("tile_x", cfg.axis(x), num_outputs=4)
+            cfg.define_split("tile_rc", cfg.axis(rc), num_outputs=3)
+            cfg.define_knob("auto_unroll_max_step", [64, 512, 1500])
+
+            target = tvm.target.current_target()
+            if target.target_name in ['nvptx', 'rocm']:
+                cfg.define_knob("unroll_explicit", [1])
+            else:
+                cfg.define_knob("unroll_explicit", [0, 1])
+
+            ##### space definition end #####
+
+            if isinstance(kernel.op,
+                          tvm.tensor.ComputeOp) and 'dilate' in kernel.op.tag:
+                s[kernel].compute_inline()
+
+            if conv.op in s.outputs:
+                output = conv
+                OL = s.cache_write(conv, 'local')
+            else:
+                output = s.outputs[0].output(0)
+                s[conv].set_scope('local')
+                OL = conv
+
+            # create cache stage
+            s[pad_data].set_scope('shared')
+            AA = pad_data
+            WW = s.cache_read(kernel, 'shared', [OL])
+
+            # tile and bind spatial axes
+            n, f, x = s[output].op.axis
+            kernel_scope, n = s[output].split(n, nparts=1)
+            bn, vn, tn, ni = cfg["tile_n"].apply(s, output, n)
+            bf, vf, tf, fi = cfg["tile_f"].apply(s, output, f)
+            bx, vx, tx, xi = cfg["tile_x"].apply(s, output, x)
+
+            s[output].reorder(bn, bf, bx, vn, vf, vx, tn, tf, tx, ni, fi, xi)
+            s[output].bind(bn, tvm.thread_axis("blockIdx.z"))
+            s[output].bind(bf, tvm.thread_axis("blockIdx.y"))
+            s[output].bind(bx, tvm.thread_axis("blockIdx.x"))
+            s[output].bind(vn, tvm.thread_axis("vthread"))
+            s[output].bind(vf, tvm.thread_axis("vthread"))
+            s[output].bind(vx, tvm.thread_axis("vthread"))
+
+            s[output].bind(tx, tvm.thread_axis("threadIdx.x"))
+            s[OL].compute_at(s[output], tx)
+            # number of threads
+            n_tz = cfg["tile_n"].size[2] * cfg["tile_f"].size[2]
+            n_tx = cfg["tile_x"].size[2]
+
+            # tile reduction axes
+            n, f, x = s[OL].op.axis
+            rc, rx = s[OL].op.reduce_axis
+            rco, rcm, rci = cfg['tile_rc'].apply(s, OL, rc)
+            s[OL].reorder(rco, rcm, rx, rci, n, f, x)
+
+            s[AA].compute_at(s[OL], rx)
+            s[WW].compute_at(s[OL], rx)
+
+            # cooperative fetching
+            for load in [AA, WW]:
+                n, f, x = s[load].op.axis
+                fused = s[load].fuse(f, x)
+                tz, fused = s[load].split(fused, nparts=n_tz)
+                tx, fused = s[load].split(fused, nparts=n_tx)
+                s[load].bind(tz, tvm.thread_axis("threadIdx.y"))
+                s[load].bind(tx, tvm.thread_axis("threadIdx.x"))
+
+            s[output].pragma(kernel_scope, 'auto_unroll_max_step',
+                             cfg['auto_unroll_max_step'].val)
+            s[output].pragma(kernel_scope, 'unroll_explicit',
+                             cfg['unroll_explicit'].val)
+
+            N, CO, OW = get_const_tuple(output.shape)
+            _, CI, KW = get_const_tuple(kernel.shape)
+            cfg.add_flop(2 * N * OW * CO * KW * CI)
+
+    traverse_inline(s, outs[0].op, _callback)
+
+    return s
+
+
+@autotvm.register_topi_schedule(generic.schedule_conv1d_nwc, ["cuda", "gpu"],
+                                ["direct"])
+def schedule_conv1d_nwc(cfg, outs):
+    """TOPI schedule callback of conv1d nwc for cuda gpu
+
+    Parameters
+    ----------
+    cfg : ConfigEntity
+        the config for this template.
+
+    outs : Array of Tensor
+        The computation graph description of conv1d
+        in the format of an array of tensors.
+
+    Returns
+    -------
+    s : Schedule
+        The computation schedule for conv1d.
+    """
+    outs = [outs] if isinstance(outs, tvm.tensor.Tensor) else outs
+    s = tvm.create_schedule([x.op for x in outs])
+
+    def _callback(op):
+        if op.tag == 'conv1d_nwc':
+            pad_data = op.input_tensors[0]
+            kernel = op.input_tensors[1]
+            conv = op.output(0)
+
+            ##### space definition begin #####
+            n, x, f = s[conv].op.axis
+            rc = s[conv].op.reduce_axis[0]
+            cfg.define_split("tile_n", cfg.axis(n), num_outputs=4)
+            cfg.define_split("tile_x", cfg.axis(x), num_outputs=4)
+            cfg.define_split("tile_f", cfg.axis(f), num_outputs=4)
+            cfg.define_split("tile_rc", cfg.axis(rc), num_outputs=3)
+            cfg.define_knob("auto_unroll_max_step", [64, 512, 1500])
+
+            target = tvm.target.current_target()
+            if target.target_name in ['nvptx', 'rocm']:
+                cfg.define_knob("unroll_explicit", [1])
+            else:
+                cfg.define_knob("unroll_explicit", [0, 1])
+
+            ##### space definition end #####
+
+            if isinstance(kernel.op,
+                          tvm.tensor.ComputeOp) and 'dilate' in kernel.op.tag:
+                s[kernel].compute_inline()
+
+            if conv.op in s.outputs:
+                output = conv
+                OL = s.cache_write(conv, 'local')
+            else:
+                output = s.outputs[0].output(0)
+                s[conv].set_scope('local')
+                OL = conv
+
+            # create cache stage
+            s[pad_data].set_scope('shared')
+            AA = pad_data
+            WW = s.cache_read(kernel, 'shared', [OL])
+
+            # tile and bind spatial axes
+            n, f, x = s[output].op.axis
+            kernel_scope, n = s[output].split(n, nparts=1)
+            bn, vn, tn, ni = cfg["tile_n"].apply(s, output, n)
+            bx, vx, tx, xi = cfg["tile_x"].apply(s, output, x)
+            bf, vf, tf, fi = cfg["tile_f"].apply(s, output, f)
+
+            s[output].reorder(bn, bx, bf, vn, vx, vf, tn, tx, tf, ni, xi, fi)
+            s[output].bind(bn, tvm.thread_axis("blockIdx.z"))
+            s[output].bind(bx, tvm.thread_axis("blockIdx.y"))
+            s[output].bind(bf, tvm.thread_axis("blockIdx.x"))
+            s[output].bind(vn, tvm.thread_axis("vthread"))
+            s[output].bind(vx, tvm.thread_axis("vthread"))
+            s[output].bind(vf, tvm.thread_axis("vthread"))
+
+            s[output].bind(tf, tvm.thread_axis("threadIdx.x"))
+            s[OL].compute_at(s[output], tf)
+            # number of threads
+            n_tz = cfg["tile_n"].size[2] * cfg["tile_x"].size[2]
+            n_tx = cfg["tile_f"].size[2]
+
+            # tile reduction axes
+            n, x, f = s[OL].op.axis
+            rc, rx = s[OL].op.reduce_axis
+            rco, rcm, rci = cfg['tile_rc'].apply(s, OL, rc)
+            s[OL].reorder(rco, rcm, rx, rci, n, x, f)
+
+            s[AA].compute_at(s[OL], rx)
+            s[WW].compute_at(s[OL], rx)
+
+            # cooperative fetching
+            for load in [AA, WW]:
+                n, x, f = s[load].op.axis
+                fused = s[load].fuse(x, f)
+                tz, fused = s[load].split(fused, nparts=n_tz)
+                tx, fused = s[load].split(fused, nparts=n_tx)
+                s[load].bind(tz, tvm.thread_axis("threadIdx.y"))
+                s[load].bind(tx, tvm.thread_axis("threadIdx.x"))
+
+            s[output].pragma(kernel_scope, 'auto_unroll_max_step',
+                             cfg['auto_unroll_max_step'].val)
+            s[output].pragma(kernel_scope, 'unroll_explicit',
+                             cfg['unroll_explicit'].val)
+
+            N, OW, CO = get_const_tuple(output.shape)
+            KW, CI, _ = get_const_tuple(kernel.shape)
+            cfg.add_flop(2 * N * OW * CO * KW * CI)
+
+    traverse_inline(s, outs[0].op, _callback)
+
+    return s
 
 Review comment:
   Unfortunately all the templates frequently use shape decomposition that is different for the two layouts. If we combined the schedules it would require a bunch of ugly if else statements and probably not end up saving much space. Also keep in mind these cuda schedules are pretty basic and may not be optimal for each layout. Keeping them separate makes improving one or the other easier in the future.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services