You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/12/07 07:25:23 UTC

[GitHub] [tvm] elvin-n opened a new pull request, #13573: [Adreno] Add global pooling schedule

elvin-n opened a new pull request, #13573:
URL: https://github.com/apache/tvm/pull/13573

   The parallelizm opportuninties in case of global pooling are limited by number of channels, need to change schedule to have parallelizm by reduction axis/use rfactor


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tvm-bot commented on pull request #13573: [Adreno] Add global pooling schedule

Posted by GitBox <gi...@apache.org>.

tvm-bot commented on PR #13573:
URL: https://github.com/apache/tvm/pull/13573#issuecomment-1340510029

   <!---bot-comment-->
   
   Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from [Reviewers](https://github.com/apache/incubator-tvm/blob/master/CONTRIBUTORS.md#reviewers) by @-ing them in a comment.
   
   <!--bot-comment-ccs-start-->
    * cc @echuraev <sub>See [#10317](https://github.com/apache/tvm/issues/10317) for details</sub><!--bot-comment-ccs-end-->
   
   <sub>Generated by [tvm-bot](https://github.com/apache/tvm/blob/main/ci/README.md#github-actions)</sub>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] echuraev merged pull request #13573: [Adreno] Add global pooling schedule

Posted by GitBox <gi...@apache.org>.

echuraev merged PR #13573:
URL: https://github.com/apache/tvm/pull/13573


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] echuraev commented on a diff in pull request #13573: [Adreno] Add global pooling schedule

Posted by GitBox <gi...@apache.org>.

echuraev commented on code in PR #13573:
URL: https://github.com/apache/tvm/pull/13573#discussion_r1041923962


##########
python/tvm/topi/adreno/pooling.py:
##########
@@ -19,6 +19,115 @@
 import tvm
 from tvm import te
 from .. import tag
+from .utils import get_div
+
+
+def schedule_adaptive_pool(outs, layout="NCHW"):
+    """Schedule for adaptive_pool.
+
+    Parameters
+    ----------
+    outs: Array of Tensor
+        The computation graph description of adaptive_pool
+        in the format of an array of tensors.
+
+    Returns
+    -------
+    s: Schedule
+        The computation schedule for adaptive_pool.
+    """
+    outs = [outs] if isinstance(outs, te.tensor.Tensor) else outs
+    s = te.create_schedule([x.op for x in outs])
+
+    def _schedule_global(Pool, layout):
+        # examples of latest pool op is global max pool and non latest is global avg pooling
+        # OL - an Expr will be used for rfactor
+        # Out - programming of the parallelizm on the global level
+        # shared is not required, local could be enough but shared scope gives quite significant
+        # perf boost
+        if Pool.op in s.outputs:
+            Out = Pool
+            OL = s.cache_write(Pool, "shared")
+        else:
+            Out = outs[0].op.output(0)
+            s[Pool].set_scope("shared")
+            OL = Pool
+
+        PaddedInput = Pool.op.input_tensors[0]
+
+        # detect axis for later reorder and binding of batch/chennel to blocks and

Review Comment:
   ```suggestion
           # detect axis for later reorder and binding of batch/channel to blocks and
   ```



##########
python/tvm/topi/adreno/pooling.py:
##########
@@ -19,6 +19,115 @@
 import tvm
 from tvm import te
 from .. import tag
+from .utils import get_div
+
+
+def schedule_adaptive_pool(outs, layout="NCHW"):
+    """Schedule for adaptive_pool.
+
+    Parameters
+    ----------
+    outs: Array of Tensor
+        The computation graph description of adaptive_pool
+        in the format of an array of tensors.
+
+    Returns
+    -------
+    s: Schedule
+        The computation schedule for adaptive_pool.
+    """
+    outs = [outs] if isinstance(outs, te.tensor.Tensor) else outs
+    s = te.create_schedule([x.op for x in outs])
+
+    def _schedule_global(Pool, layout):
+        # examples of latest pool op is global max pool and non latest is global avg pooling
+        # OL - an Expr will be used for rfactor
+        # Out - programming of the parallelizm on the global level
+        # shared is not required, local could be enough but shared scope gives quite significant
+        # perf boost
+        if Pool.op in s.outputs:
+            Out = Pool
+            OL = s.cache_write(Pool, "shared")
+        else:
+            Out = outs[0].op.output(0)
+            s[Pool].set_scope("shared")
+            OL = Pool
+
+        PaddedInput = Pool.op.input_tensors[0]
+
+        # detect axis for later reorder and binding of batch/chennel to blocks and
+        # spatial to threads
+        if layout in ("NCHW", "NCHW4c"):
+            channel_index = 1
+            height_index = 2
+            width_index = 3
+        else:
+            channel_index = 3
+            height_index = 1
+            width_index = 2
+
+        if isinstance(PaddedInput.op, tvm.te.ComputeOp):
+            s[PaddedInput].compute_inline()
+
+        fused_reduce = s[OL].fuse(
+            *[s[OL].op.reduce_axis[i] for i in range(len(s[OL].op.reduce_axis))]
+        )
+
+        spatial = PaddedInput.shape[height_index].value * PaddedInput.shape[width_index].value
+        max_threads = spatial // 25 if spatial > 25 else 1
+        max_threads = 256 if max_threads > 256 else max_threads

Review Comment:
   Why 25 and 256?



##########
python/tvm/relay/op/strategy/adreno.py:
##########
@@ -215,6 +215,13 @@ def schedule_reduce_adreno(attrs, outs, target):
         return topi.adreno.schedule_reduce(outs)
 
 
+@schedule_adaptive_pool.register(["adreno"])
+def schedule_adaptive_pool_cuda(attrs, outs, target):
+    """schedule adaptive pooling ops for cuda"""

Review Comment:
   ```suggestion
   def schedule_adaptive_pool_adreno(attrs, outs, target):
       """schedule adaptive pooling ops for adreno"""
   ```



##########
python/tvm/topi/adreno/pooling.py:
##########
@@ -19,6 +19,115 @@
 import tvm
 from tvm import te
 from .. import tag
+from .utils import get_div
+
+
+def schedule_adaptive_pool(outs, layout="NCHW"):
+    """Schedule for adaptive_pool.
+
+    Parameters
+    ----------
+    outs: Array of Tensor
+        The computation graph description of adaptive_pool
+        in the format of an array of tensors.
+
+    Returns
+    -------
+    s: Schedule
+        The computation schedule for adaptive_pool.
+    """
+    outs = [outs] if isinstance(outs, te.tensor.Tensor) else outs
+    s = te.create_schedule([x.op for x in outs])
+
+    def _schedule_global(Pool, layout):
+        # examples of latest pool op is global max pool and non latest is global avg pooling
+        # OL - an Expr will be used for rfactor
+        # Out - programming of the parallelizm on the global level
+        # shared is not required, local could be enough but shared scope gives quite significant
+        # perf boost
+        if Pool.op in s.outputs:
+            Out = Pool
+            OL = s.cache_write(Pool, "shared")
+        else:
+            Out = outs[0].op.output(0)
+            s[Pool].set_scope("shared")
+            OL = Pool
+
+        PaddedInput = Pool.op.input_tensors[0]
+
+        # detect axis for later reorder and binding of batch/chennel to blocks and
+        # spatial to threads
+        if layout in ("NCHW", "NCHW4c"):
+            channel_index = 1
+            height_index = 2
+            width_index = 3
+        else:
+            channel_index = 3
+            height_index = 1
+            width_index = 2
+
+        if isinstance(PaddedInput.op, tvm.te.ComputeOp):
+            s[PaddedInput].compute_inline()
+
+        fused_reduce = s[OL].fuse(
+            *[s[OL].op.reduce_axis[i] for i in range(len(s[OL].op.reduce_axis))]
+        )

Review Comment:
   ```suggestion
           fused_reduce = s[OL].fuse(*s[OL].op.reduce_axis)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org