You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/11/03 22:38:04 UTC
[GitHub] [incubator-tvm] alexgl-github opened a new pull request #6840: conv1d_transpose speedup
alexgl-github opened a new pull request #6840:
URL: https://github.com/apache/incubator-tvm/pull/6840
Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from [Reviewers](https://github.com/apache/incubator-tvm/blob/master/CONTRIBUTORS.md#reviewers) by @ them in the pull request thread.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-tvm] alexgl-github edited a comment on pull request #6840: conv1d_transpose speedup
Posted by GitBox <gi...@apache.org>.
alexgl-github edited a comment on pull request #6840:
URL: https://github.com/apache/incubator-tvm/pull/6840#issuecomment-721412452
Speedup transposed 1d convolution by eliminating unnecessary multiplications by data values which contain zeroes, for stride greater then 1.
Below are current vs proposed latency numbers for various transposed conv1d parameters:
`
latency current=0.06837 sec new=0.00049 sec channels_out= 1 input_shape=(1, 257, 128) kernel_size=512 strides=128 padding=256
latency current=0.53093 sec new=0.10341 sec channels_out= 257 input_shape=(1, 257, 128) kernel_size=512 strides=4 padding=256
latency current=0.00292 sec new=0.00307 sec channels_out= 1 input_shape=(1, 257, 128) kernel_size=512 strides=1 padding=256
latency current=0.00474 sec new=0.00171 sec channels_out= 1 input_shape=(1, 257, 128) kernel_size=512 strides=2 padding=256
latency current=0.00955 sec new=0.00056 sec channels_out= 1 input_shape=(1, 257, 128) kernel_size=512 strides=16 padding current=256
latency current=0.00054 sec new=0.00023 sec channels_out= 1 input_shape=(1, 1, 16384) kernel_size=512 strides=2 padding=256
latency current=0.00385 sec new=0.00089 sec channels_out= 4 input_shape=(1, 1, 16384) kernel_size=512 strides=4 padding=256
latency current=0.00013 sec new=0.00004 sec channels_out= 1 input_shape=(1, 1, 1024) kernel_size=512 strides=5 padding=256
latency current=0.00002 sec new=0.00002 sec channels_out= 32 input_shape=(1, 3, 224) kernel_size=5 strides=1 padding=0
latency current=0.00004 sec new=0.00003 sec channels_out= 32 input_shape=(1, 3, 224) kernel_size=5 strides=2 padding=0
latency current=0.00006 sec new=0.00003 sec channels_out= 128 input_shape=(1, 32, 32) kernel_size=5 strides=2 padding=0
`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-tvm] vinx13 commented on pull request #6840: conv1d_transpose speedup
Posted by GitBox <gi...@apache.org>.
vinx13 commented on pull request #6840:
URL: https://github.com/apache/incubator-tvm/pull/6840#issuecomment-723379937
Thanks @alexgl-github @anijain2305 @giuseros
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-tvm] anijain2305 commented on pull request #6840: conv1d_transpose speedup
Posted by GitBox <gi...@apache.org>.
anijain2305 commented on pull request #6840:
URL: https://github.com/apache/incubator-tvm/pull/6840#issuecomment-721438682
@vinx13 Can you PTAL?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-tvm] alexgl-github commented on a change in pull request #6840: conv1d_transpose speedup
Posted by GitBox <gi...@apache.org>.
alexgl-github commented on a change in pull request #6840:
URL: https://github.com/apache/incubator-tvm/pull/6840#discussion_r517582147
##########
File path: python/tvm/topi/cuda/conv1d_transpose_ncw.py
##########
@@ -65,36 +66,43 @@ def conv1d_transpose_ncw(cfg, data, kernel, stride, padding, out_dtype, output_p
out_width = (inp_width - 1) * stride + kernel_size - pad_left - pad_right + output_padding
pad_left = kernel_size - 1 - pad_left
pad_right = kernel_size - 1 - pad_right + output_padding
+ padded_width = pad_left + inp_width + pad_right
dilated_width = stride * (inp_width - 1) + 1
- data = te.compute(
- (batch, inp_channels, pad_left + dilated_width + pad_right),
+ padded_dilated_width = pad_left + dilated_width + pad_right
+
+ padded_data = te.compute(
+ (batch, inp_channels, padded_width),
lambda n, c, x: tvm.tir.if_then_else(
- tvm.tir.all(
- x >= pad_left,
- x < pad_left + dilated_width,
- tvm.tir.indexmod(x - pad_left, stride).equal(0),
- ),
- data[n, c, tvm.tir.indexdiv(x - pad_left, stride)],
- tvm.tir.const(0.0, "float32"),
- ),
- name="data_pad",
- )
-
- dc = te.reduce_axis((0, inp_channels), name="dc")
- dw = te.reduce_axis((0, kernel_size), name="dw")
+ tvm.tir.all(x >= pad_left,
+ x < pad_left + inp_width),
+ data[n, c, x - pad_left],
+ tvm.tir.const(0., "float32")),
+ name='data_pad')
+
+ padded_kernel = te.compute(
+ (inp_channels, out_channels, kernel_size + stride - 1),
+ lambda ci, co, k: tvm.tir.if_then_else(
+ tvm.tir.all(k < kernel_size),
+ kernel[ci, co, kernel_size-k-1],
+ tvm.tir.const(0., "float32")),
+ name='kernel_pad')
+
+ ci = te.reduce_axis((0, inp_channels), name='ci')
+ k = te.reduce_axis((0, (kernel_size + stride - 1)//stride), name='k')
Review comment:
I don't think ceil of floating point division can be used in reduce_axis. There.s no integer equivalent in tir.op, i've replaced it with tir.indexdiv(...)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-tvm] alexgl-github commented on a change in pull request #6840: conv1d_transpose speedup
Posted by GitBox <gi...@apache.org>.
alexgl-github commented on a change in pull request #6840:
URL: https://github.com/apache/incubator-tvm/pull/6840#discussion_r517581190
##########
File path: python/tvm/topi/cuda/conv1d_transpose_ncw.py
##########
@@ -65,36 +66,43 @@ def conv1d_transpose_ncw(cfg, data, kernel, stride, padding, out_dtype, output_p
out_width = (inp_width - 1) * stride + kernel_size - pad_left - pad_right + output_padding
pad_left = kernel_size - 1 - pad_left
pad_right = kernel_size - 1 - pad_right + output_padding
+ padded_width = pad_left + inp_width + pad_right
dilated_width = stride * (inp_width - 1) + 1
- data = te.compute(
- (batch, inp_channels, pad_left + dilated_width + pad_right),
+ padded_dilated_width = pad_left + dilated_width + pad_right
+
+ padded_data = te.compute(
+ (batch, inp_channels, padded_width),
lambda n, c, x: tvm.tir.if_then_else(
- tvm.tir.all(
- x >= pad_left,
- x < pad_left + dilated_width,
- tvm.tir.indexmod(x - pad_left, stride).equal(0),
- ),
- data[n, c, tvm.tir.indexdiv(x - pad_left, stride)],
- tvm.tir.const(0.0, "float32"),
- ),
- name="data_pad",
- )
-
- dc = te.reduce_axis((0, inp_channels), name="dc")
- dw = te.reduce_axis((0, kernel_size), name="dw")
+ tvm.tir.all(x >= pad_left,
+ x < pad_left + inp_width),
+ data[n, c, x - pad_left],
+ tvm.tir.const(0., "float32")),
+ name='data_pad')
+
+ padded_kernel = te.compute(
+ (inp_channels, out_channels, kernel_size + stride - 1),
+ lambda ci, co, k: tvm.tir.if_then_else(
+ tvm.tir.all(k < kernel_size),
+ kernel[ci, co, kernel_size-k-1],
+ tvm.tir.const(0., "float32")),
+ name='kernel_pad')
+
+ ci = te.reduce_axis((0, inp_channels), name='ci')
+ k = te.reduce_axis((0, (kernel_size + stride - 1)//stride), name='k')
+ border = pad_left * (stride - 1)
+
data_out = te.compute(
(batch, out_channels, out_width),
- lambda b, c, w: te.sum(
- data[b, dc, w + dw].astype(out_dtype)
- * kernel[dc, c, kernel_size - 1 - dw].astype(out_dtype),
- axis=[dc, dw],
- ),
- tag="conv1d_transpose_ncw",
- )
+ lambda b, co, w: te.sum(
+ padded_data[b, ci, (border+w + stride - 1) // stride + k].astype(out_dtype) *
Review comment:
Works for padding=0, kernel_size=2, stride=2, added a test for this.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-tvm] alexgl-github commented on a change in pull request #6840: conv1d_transpose speedup
Posted by GitBox <gi...@apache.org>.
alexgl-github commented on a change in pull request #6840:
URL: https://github.com/apache/incubator-tvm/pull/6840#discussion_r517580833
##########
File path: tests/python/topi/python/test_topi_conv1d_transpose_ncw.py
##########
@@ -94,6 +94,9 @@ def test_conv1d_transpose_ncw():
verify_conv1d_transpose_ncw(1, 1, 10, 1, 5, 1, (0, 3), (0,))
verify_conv1d_transpose_ncw(1, 1, 10, 1, 5, 1, (1, 3), (0,))
verify_conv1d_transpose_ncw(1, 1, 10, 1, 5, 1, (2, 3), (0,))
+ verify_conv1d_transpose_ncw(1, 257, 128, 1, 512, 128, 256, (0,))
+ verify_conv1d_transpose_ncw(1, 257, 128, 2, 512, 128, 256, (0,))
+ verify_conv1d_transpose_ncw(1, 257, 128, 257, 512, 128, 256, (0,))
Review comment:
Added more tests with kernel_size=stride
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-tvm] vinx13 merged pull request #6840: conv1d_transpose speedup
Posted by GitBox <gi...@apache.org>.
vinx13 merged pull request #6840:
URL: https://github.com/apache/incubator-tvm/pull/6840
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-tvm] alexgl-github commented on a change in pull request #6840: conv1d_transpose speedup
Posted by GitBox <gi...@apache.org>.
alexgl-github commented on a change in pull request #6840:
URL: https://github.com/apache/incubator-tvm/pull/6840#discussion_r517582147
##########
File path: python/tvm/topi/cuda/conv1d_transpose_ncw.py
##########
@@ -65,36 +66,43 @@ def conv1d_transpose_ncw(cfg, data, kernel, stride, padding, out_dtype, output_p
out_width = (inp_width - 1) * stride + kernel_size - pad_left - pad_right + output_padding
pad_left = kernel_size - 1 - pad_left
pad_right = kernel_size - 1 - pad_right + output_padding
+ padded_width = pad_left + inp_width + pad_right
dilated_width = stride * (inp_width - 1) + 1
- data = te.compute(
- (batch, inp_channels, pad_left + dilated_width + pad_right),
+ padded_dilated_width = pad_left + dilated_width + pad_right
+
+ padded_data = te.compute(
+ (batch, inp_channels, padded_width),
lambda n, c, x: tvm.tir.if_then_else(
- tvm.tir.all(
- x >= pad_left,
- x < pad_left + dilated_width,
- tvm.tir.indexmod(x - pad_left, stride).equal(0),
- ),
- data[n, c, tvm.tir.indexdiv(x - pad_left, stride)],
- tvm.tir.const(0.0, "float32"),
- ),
- name="data_pad",
- )
-
- dc = te.reduce_axis((0, inp_channels), name="dc")
- dw = te.reduce_axis((0, kernel_size), name="dw")
+ tvm.tir.all(x >= pad_left,
+ x < pad_left + inp_width),
+ data[n, c, x - pad_left],
+ tvm.tir.const(0., "float32")),
+ name='data_pad')
+
+ padded_kernel = te.compute(
+ (inp_channels, out_channels, kernel_size + stride - 1),
+ lambda ci, co, k: tvm.tir.if_then_else(
+ tvm.tir.all(k < kernel_size),
+ kernel[ci, co, kernel_size-k-1],
+ tvm.tir.const(0., "float32")),
+ name='kernel_pad')
+
+ ci = te.reduce_axis((0, inp_channels), name='ci')
+ k = te.reduce_axis((0, (kernel_size + stride - 1)//stride), name='k')
Review comment:
I don't think ceil of floating point division can be used in reduce_axis. There.s no integer equivalent for ceil in tir.op, i've replaced it with tir.indexdiv(...)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-tvm] giuseros commented on a change in pull request #6840: conv1d_transpose speedup
Posted by GitBox <gi...@apache.org>.
giuseros commented on a change in pull request #6840:
URL: https://github.com/apache/incubator-tvm/pull/6840#discussion_r517271380
##########
File path: tests/python/topi/python/test_topi_conv1d_transpose_ncw.py
##########
@@ -94,6 +94,9 @@ def test_conv1d_transpose_ncw():
verify_conv1d_transpose_ncw(1, 1, 10, 1, 5, 1, (0, 3), (0,))
verify_conv1d_transpose_ncw(1, 1, 10, 1, 5, 1, (1, 3), (0,))
verify_conv1d_transpose_ncw(1, 1, 10, 1, 5, 1, (2, 3), (0,))
+ verify_conv1d_transpose_ncw(1, 257, 128, 1, 512, 128, 256, (0,))
+ verify_conv1d_transpose_ncw(1, 257, 128, 2, 512, 128, 256, (0,))
+ verify_conv1d_transpose_ncw(1, 257, 128, 257, 512, 128, 256, (0,))
Review comment:
Could you add a test for the case kernel_size==stride? I think this is common in encoder/decoder networks
##########
File path: python/tvm/topi/cuda/conv1d_transpose_ncw.py
##########
@@ -65,36 +66,43 @@ def conv1d_transpose_ncw(cfg, data, kernel, stride, padding, out_dtype, output_p
out_width = (inp_width - 1) * stride + kernel_size - pad_left - pad_right + output_padding
pad_left = kernel_size - 1 - pad_left
pad_right = kernel_size - 1 - pad_right + output_padding
+ padded_width = pad_left + inp_width + pad_right
dilated_width = stride * (inp_width - 1) + 1
- data = te.compute(
- (batch, inp_channels, pad_left + dilated_width + pad_right),
+ padded_dilated_width = pad_left + dilated_width + pad_right
+
+ padded_data = te.compute(
+ (batch, inp_channels, padded_width),
lambda n, c, x: tvm.tir.if_then_else(
- tvm.tir.all(
- x >= pad_left,
- x < pad_left + dilated_width,
- tvm.tir.indexmod(x - pad_left, stride).equal(0),
- ),
- data[n, c, tvm.tir.indexdiv(x - pad_left, stride)],
- tvm.tir.const(0.0, "float32"),
- ),
- name="data_pad",
- )
-
- dc = te.reduce_axis((0, inp_channels), name="dc")
- dw = te.reduce_axis((0, kernel_size), name="dw")
+ tvm.tir.all(x >= pad_left,
+ x < pad_left + inp_width),
+ data[n, c, x - pad_left],
+ tvm.tir.const(0., "float32")),
+ name='data_pad')
+
+ padded_kernel = te.compute(
+ (inp_channels, out_channels, kernel_size + stride - 1),
+ lambda ci, co, k: tvm.tir.if_then_else(
+ tvm.tir.all(k < kernel_size),
+ kernel[ci, co, kernel_size-k-1],
+ tvm.tir.const(0., "float32")),
+ name='kernel_pad')
+
+ ci = te.reduce_axis((0, inp_channels), name='ci')
+ k = te.reduce_axis((0, (kernel_size + stride - 1)//stride), name='k')
+ border = pad_left * (stride - 1)
+
data_out = te.compute(
(batch, out_channels, out_width),
- lambda b, c, w: te.sum(
- data[b, dc, w + dw].astype(out_dtype)
- * kernel[dc, c, kernel_size - 1 - dw].astype(out_dtype),
- axis=[dc, dw],
- ),
- tag="conv1d_transpose_ncw",
- )
+ lambda b, co, w: te.sum(
+ padded_data[b, ci, (border+w + stride - 1) // stride + k].astype(out_dtype) *
+ padded_kernel[ci, co, k*stride + tvm.tir.indexmod(stride-w-border, stride)].astype(out_dtype),
+ axis=[ci, k]), tag="conv1d_transpose_ncw")
Review comment:
Could you add a bit more comments about the algorithm you are using?
##########
File path: python/tvm/topi/cuda/conv1d_transpose_ncw.py
##########
@@ -65,36 +66,43 @@ def conv1d_transpose_ncw(cfg, data, kernel, stride, padding, out_dtype, output_p
out_width = (inp_width - 1) * stride + kernel_size - pad_left - pad_right + output_padding
pad_left = kernel_size - 1 - pad_left
pad_right = kernel_size - 1 - pad_right + output_padding
+ padded_width = pad_left + inp_width + pad_right
dilated_width = stride * (inp_width - 1) + 1
- data = te.compute(
- (batch, inp_channels, pad_left + dilated_width + pad_right),
+ padded_dilated_width = pad_left + dilated_width + pad_right
+
+ padded_data = te.compute(
+ (batch, inp_channels, padded_width),
lambda n, c, x: tvm.tir.if_then_else(
- tvm.tir.all(
- x >= pad_left,
- x < pad_left + dilated_width,
- tvm.tir.indexmod(x - pad_left, stride).equal(0),
- ),
- data[n, c, tvm.tir.indexdiv(x - pad_left, stride)],
- tvm.tir.const(0.0, "float32"),
- ),
- name="data_pad",
- )
-
- dc = te.reduce_axis((0, inp_channels), name="dc")
- dw = te.reduce_axis((0, kernel_size), name="dw")
+ tvm.tir.all(x >= pad_left,
+ x < pad_left + inp_width),
+ data[n, c, x - pad_left],
+ tvm.tir.const(0., "float32")),
+ name='data_pad')
+
+ padded_kernel = te.compute(
+ (inp_channels, out_channels, kernel_size + stride - 1),
+ lambda ci, co, k: tvm.tir.if_then_else(
+ tvm.tir.all(k < kernel_size),
+ kernel[ci, co, kernel_size-k-1],
+ tvm.tir.const(0., "float32")),
+ name='kernel_pad')
+
+ ci = te.reduce_axis((0, inp_channels), name='ci')
+ k = te.reduce_axis((0, (kernel_size + stride - 1)//stride), name='k')
+ border = pad_left * (stride - 1)
+
data_out = te.compute(
(batch, out_channels, out_width),
- lambda b, c, w: te.sum(
- data[b, dc, w + dw].astype(out_dtype)
- * kernel[dc, c, kernel_size - 1 - dw].astype(out_dtype),
- axis=[dc, dw],
- ),
- tag="conv1d_transpose_ncw",
- )
+ lambda b, co, w: te.sum(
+ padded_data[b, ci, (border+w + stride - 1) // stride + k].astype(out_dtype) *
Review comment:
Are you sure this works for:padding=0, kernel_size=2, stride=2?
In this case `ceil(w,stride)` gives 0, 1, 1, 2, 2, .... While *I think* (but might be wrong) that you want 0, 0, 1, 1, 2, 2, etc... when you index the `padded_data` tensor
##########
File path: python/tvm/topi/cuda/conv1d_transpose_ncw.py
##########
@@ -65,36 +66,43 @@ def conv1d_transpose_ncw(cfg, data, kernel, stride, padding, out_dtype, output_p
out_width = (inp_width - 1) * stride + kernel_size - pad_left - pad_right + output_padding
pad_left = kernel_size - 1 - pad_left
pad_right = kernel_size - 1 - pad_right + output_padding
+ padded_width = pad_left + inp_width + pad_right
dilated_width = stride * (inp_width - 1) + 1
- data = te.compute(
- (batch, inp_channels, pad_left + dilated_width + pad_right),
+ padded_dilated_width = pad_left + dilated_width + pad_right
+
+ padded_data = te.compute(
+ (batch, inp_channels, padded_width),
lambda n, c, x: tvm.tir.if_then_else(
- tvm.tir.all(
- x >= pad_left,
- x < pad_left + dilated_width,
- tvm.tir.indexmod(x - pad_left, stride).equal(0),
- ),
- data[n, c, tvm.tir.indexdiv(x - pad_left, stride)],
- tvm.tir.const(0.0, "float32"),
- ),
- name="data_pad",
- )
-
- dc = te.reduce_axis((0, inp_channels), name="dc")
- dw = te.reduce_axis((0, kernel_size), name="dw")
+ tvm.tir.all(x >= pad_left,
+ x < pad_left + inp_width),
+ data[n, c, x - pad_left],
+ tvm.tir.const(0., "float32")),
+ name='data_pad')
+
+ padded_kernel = te.compute(
+ (inp_channels, out_channels, kernel_size + stride - 1),
+ lambda ci, co, k: tvm.tir.if_then_else(
+ tvm.tir.all(k < kernel_size),
+ kernel[ci, co, kernel_size-k-1],
+ tvm.tir.const(0., "float32")),
+ name='kernel_pad')
+
+ ci = te.reduce_axis((0, inp_channels), name='ci')
+ k = te.reduce_axis((0, (kernel_size + stride - 1)//stride), name='k')
Review comment:
Instead of writing `(kernel_size + stride - 1)//stride`, could you write `ceil(kernel_size/stride)`? At the end, it would make the code more understandable, and wouldn't hurt performance.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-tvm] anijain2305 edited a comment on pull request #6840: conv1d_transpose speedup
Posted by GitBox <gi...@apache.org>.
anijain2305 edited a comment on pull request #6840:
URL: https://github.com/apache/incubator-tvm/pull/6840#issuecomment-721438682
@vinx13 Can you PTAL for CUDA stuff?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-tvm] alexgl-github commented on a change in pull request #6840: conv1d_transpose speedup
Posted by GitBox <gi...@apache.org>.
alexgl-github commented on a change in pull request #6840:
URL: https://github.com/apache/incubator-tvm/pull/6840#discussion_r517582553
##########
File path: python/tvm/topi/cuda/conv1d_transpose_ncw.py
##########
@@ -65,36 +66,43 @@ def conv1d_transpose_ncw(cfg, data, kernel, stride, padding, out_dtype, output_p
out_width = (inp_width - 1) * stride + kernel_size - pad_left - pad_right + output_padding
pad_left = kernel_size - 1 - pad_left
pad_right = kernel_size - 1 - pad_right + output_padding
+ padded_width = pad_left + inp_width + pad_right
dilated_width = stride * (inp_width - 1) + 1
- data = te.compute(
- (batch, inp_channels, pad_left + dilated_width + pad_right),
+ padded_dilated_width = pad_left + dilated_width + pad_right
+
+ padded_data = te.compute(
+ (batch, inp_channels, padded_width),
lambda n, c, x: tvm.tir.if_then_else(
- tvm.tir.all(
- x >= pad_left,
- x < pad_left + dilated_width,
- tvm.tir.indexmod(x - pad_left, stride).equal(0),
- ),
- data[n, c, tvm.tir.indexdiv(x - pad_left, stride)],
- tvm.tir.const(0.0, "float32"),
- ),
- name="data_pad",
- )
-
- dc = te.reduce_axis((0, inp_channels), name="dc")
- dw = te.reduce_axis((0, kernel_size), name="dw")
+ tvm.tir.all(x >= pad_left,
+ x < pad_left + inp_width),
+ data[n, c, x - pad_left],
+ tvm.tir.const(0., "float32")),
+ name='data_pad')
+
+ padded_kernel = te.compute(
+ (inp_channels, out_channels, kernel_size + stride - 1),
+ lambda ci, co, k: tvm.tir.if_then_else(
+ tvm.tir.all(k < kernel_size),
+ kernel[ci, co, kernel_size-k-1],
+ tvm.tir.const(0., "float32")),
+ name='kernel_pad')
+
+ ci = te.reduce_axis((0, inp_channels), name='ci')
+ k = te.reduce_axis((0, (kernel_size + stride - 1)//stride), name='k')
+ border = pad_left * (stride - 1)
+
data_out = te.compute(
(batch, out_channels, out_width),
- lambda b, c, w: te.sum(
- data[b, dc, w + dw].astype(out_dtype)
- * kernel[dc, c, kernel_size - 1 - dw].astype(out_dtype),
- axis=[dc, dw],
- ),
- tag="conv1d_transpose_ncw",
- )
+ lambda b, co, w: te.sum(
+ padded_data[b, ci, (border+w + stride - 1) // stride + k].astype(out_dtype) *
+ padded_kernel[ci, co, k*stride + tvm.tir.indexmod(stride-w-border, stride)].astype(out_dtype),
+ axis=[ci, k]), tag="conv1d_transpose_ncw")
Review comment:
Added notes about the index calculation
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-tvm] alexgl-github commented on pull request #6840: conv1d_transpose speedup
Posted by GitBox <gi...@apache.org>.
alexgl-github commented on pull request #6840:
URL: https://github.com/apache/incubator-tvm/pull/6840#issuecomment-721412452
Speedup transposed 1d convolution by eliminating unnecessary multiplications by data values which contain zeroes, for stride greater then 1.
Below are current vs proposed latency numbers for various transposed conv1d parameters:
`
current: new:
latency=0.06837 sec latency=0.00049 sec channels_out= 1 input_shape=(1, 257, 128) kernel_size=512 strides=128 padding=256
latency=0.53093 sec latency=0.10341 sec channels_out= 257 input_shape=(1, 257, 128) kernel_size=512 strides=4 padding=256
latency=0.00292 sec latency=0.00307 sec channels_out= 1 input_shape=(1, 257, 128) kernel_size=512 strides=1 padding=256
latency=0.00474 sec latency=0.00171 sec channels_out= 1 input_shape=(1, 257, 128) kernel_size=512 strides=2 padding=256
latency=0.00955 sec latency=0.00056 sec channels_out= 1 input_shape=(1, 257, 128) kernel_size=512 strides=16 padding=256
latency=0.00054 sec latency=0.00023 sec channels_out= 1 input_shape=(1, 1, 16384) kernel_size=512 strides=2 padding=256
latency=0.00385 sec latency=0.00089 sec channels_out= 4 input_shape=(1, 1, 16384) kernel_size=512 strides=4 padding=256
latency=0.00013 sec latency=0.00004 sec channels_out= 1 input_shape=(1, 1, 1024) kernel_size=512 strides=5 padding=256
latency=0.00002 sec latency=0.00002 sec channels_out= 32 input_shape=(1, 3, 224) kernel_size=5 strides=1 padding=0
latency=0.00004 sec latency=0.00003 sec channels_out= 32 input_shape=(1, 3, 224) kernel_size=5 strides=2 padding=0
latency=0.00006 sec latency=0.00003 sec channels_out= 128 input_shape=(1, 32, 32) kernel_size=5 strides=2 padding=0
`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org