You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/10/04 18:26:21 UTC

[GitHub] [tvm-rfcs] areusch commented on a change in pull request #37: Introduce the Arm(R) Ethos(TM)-U Cascading Planner

areusch commented on a change in pull request #37:
URL: https://github.com/apache/tvm-rfcs/pull/37#discussion_r718677583



##########
File path: rfcs/0037-arm-ethosu-cascading-planner.md
##########
@@ -41,51 +41,17 @@ Deciding on exactly which operators should be cascaded and with what striping pa
 
 They key piece of information to calculate in order to characterize a cascade is how the stripe size changes throughout. This is a function of the data dependency between an operator's inputs and outputs. For many operators that we're interested in, an affine transform matrix can be used to represent this dependency if we represent the input and output stripe sizes as vectors. Affine transforms typically consider 'augmented' matrices and vectors (https://en.wikipedia.org/wiki/Affine_transformation#Augmented_matrix) which allow for the representation of constant changes. Concretely, we define the transform matrix M as being the matrix for which the following holds:
 
-$$stripe_{in} = {M} \cdot {stripe_{out}}$$
+![meta-schedule-workflow](../resources/cascading-formula-1.png)
 
 Let's briefly consider how to derive such a transform matrix for a 3x3 unstrided, undilated and unpadded NHWC convolution. Immediately, the '3x3' kernel tells us something important: a single element in the output depends on 3x3 elements in the height/width of the input. If we were instead to consider a 2x2 region of the output in the height/width dimensions, we'd then need a 4x4 region in the input. So in general, the rule is that we need 2 more elements in height and width when calculating the dependencies of an output stripe. It can be shown that more generally this number is the kernel_size-1 in each axis. Now to consider the channels, in a convolution no matter how many output elements you are computing you'll always need every input channel. This is because the input channel axis is a reduction axis in a convolution, in a sense it isn't 'reflected' in the output. Combining these two observations, we arrive at the following transform matrix:

Review comment:
       > Now to consider the channels, in a convolution no matter how many output elements you are computing you'll always need every input channel.
   
   just curious: what about depthwise convolutions?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org