You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/10/06 15:59:16 UTC
[GitHub] [tvm-rfcs] Lunderberg commented on pull request #39: [RFC][TIR] Separate physical and logical layout of buffers

Lunderberg commented on pull request #39:
URL: https://github.com/apache/tvm-rfcs/pull/39#issuecomment-936560130


   Thank you for the comments, @vinx13.
   
   > For example, another way is to use a mapping function: (n, c, h, w) -> (n, tir.floordiv(c, 4), h, w, tir.floormod(c, 4)). This would allow arbitrary mapping (we can add more restrictions like requiring affine mapping though, to make analysis easier).
   
   ~~Having an arbitrary mapping function was something that I considered as an alternative, and would like to use as part of the internal representation.  However, for the user-facing API, I prefer having a representation that cannot express an illegal internal state, rather than relying on validation.  The arbitrary mapping function could have more than one logical index map to the same physical index (e.g. `(i,j) -> (i+j,)`), which would need to be caught after the fact.~~
   
   Never mind, I'm convinced after going through your point below about CUDA's permuted layout.
   
   > A possible use cases of more complex mapping is [permuted layout](https://github.com/NVIDIA/cutlass/blob/master/media/docs/implicit_gemm_convolution.md#shared-memory-layouts) for shared memory on CUDA.
   
   Thank you for the example, and this sort of example was what I was hoping to get in order to determine if the representation of reorder/splits is sufficiently flexible.  If I'm reading the link correctly, I agree that the physical layout in shared memory cannot be expressed in the proposed notation.  If the global memory is 2-d memory `(i,j)`, the closest to representing physical layout in shared memory would be either `((i,2), (j,4), (j,8), SEP, (i,4))`.  This would maintain the 128-bit vectors `(j,8)`, transpose because `(j,8)` appears in the first physical coordinate, but wouldn't handle the XOR condition needed to have the conflict-free access.  It also couldn't specify that the `(i,2)` in the first physical coordinate would need to take precedence over the `(i,4)` in the second physical coordinate.  In the functional notation, these would be `(i,j) -> (floormod(i,2), (floormod(row,8)) ^ (floordiv(j,8)), floormod(j,8), SEP, row)`, where `row = floordiv(i, 2)`.
   
   I am convinced, while the earlier notation may be useful as a more readable input in some cases, it doesn't cover all use cases.  At most, it could be an alternative form of input, but the function definition mapping from logical indices to physical indices should be the primary representation.
   
   In terms of how to represent this in the TIR graph, I'll have to think a bit on it.  I'm picturing either `BufferNode` holding `Array<Var> index_vars` and `Array<IterSumExpr> physical_layout`, or making a TIR variant of `tvm::relay::FunctionNode`.
   
   > Also, there are related affine analysis infrastructure available, it would be great if we can reuse it for loop analysis and rewriting.
   
   Thank you for mentioning these.  I intend to use these utilities for the implementation, and will add it to the detail section.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org