You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/07/06 21:25:11 UTC

[GitHub] [tvm] junrushao1994 commented on pull request #12019: [TIR] Revert #11428 and move loop dependent alloc extent check after region union

junrushao1994 commented on PR #12019:
URL: https://github.com/apache/tvm/pull/12019#issuecomment-1176764243

   Moving our discussion thread from #11428. Thanks @ArmageddonKnight for spotting this nasty corner case, and @comaniac for offline discussion :-)
   
   @junrushao1994:
   
   > This PR introduces a regression with those lines: https://github.com/apache/tvm/blob/d4be49aec62299275565066b56a0555bafc2ccac/src/tir/transforms/compact_buffer_region.cc#L77-L85
   > 
   > Consider a case below:
   > 
   > ```python
   > extent = 960 - max(bx // 18 * 128, 832)
   > where bx \in [0, 144)
   > ```
   > 
   > Even if `bx` exists in `extent`, the upper bound of the allocation is not affected (which is 128). However, the check there introduces a false positive which falls back to a much more conservative estimate ([full region](https://github.com/apache/tvm/blob/d4be49aec62299275565066b56a0555bafc2ccac/src/tir/transforms/compact_buffer_region.cc#L77-L85)).
   
   @junrushao1994 
   
   > A minimal reproducible example: https://gist.github.com/junrushao1994/943eb0e9232a21a2748eb35af0fac4ff
   > 
   > The region of `A_shared` should be [128, 4] instead of [192, 4]
   
   @junrushao1994 
   
   > so let me briefly introduce the workflow of Compute-Buffer-Region first, and then let’s think about how to make it perfectly fit in our usecase.
   > 
   > On each use-site of a buffer, we calculate the region it is touched by the loops above it, and then by unionizing those regions, it’s made possible to trim out those regions that are not touched.
   > 
   > As a simplest example, if the original size of a buffer in shared memory is `[1024, 1024]`, but only `[bx : bx + 10, by : by + 20]` is touched in a threadblock, then the buffer size could shrink to `[10, 20]` which is significantly smaller.
   > 
   > Then to provide a slightly more complicated example, if the buffer is touched in two different places, where the region are `[bx : bx + 10 : by : by + 20]`, `[bx + 10: bx + 20, by : by + 30]`, then we calculate the union of the two regions, i.e. `[bx : bx + 20, by : by + 30]`, and then do shrinking so that its size of `[20, 30]`.
   > 
   > The problem this PR aims to address is that some buffer’s size actually depends on irrelevant outer variables, for example, the shared memory size to be `[0, 960 - max(bx // 18 * 128, 832)]`, where `bx` is the extent of `blockIdx.x`. However, it takes an approach to over-relax the region when seeing any undesired variables (in our case, it’s `bx`), which is less optimal.
   > 
   > The actual solution should be: calculating the region union bound first, and then remove the undesired variables with something similar to analyzer's `const_int_bound` (note that we don’t want the bound to be strictly constant)
   
   @wrongtest
   
   > > calculating the region union bound first
   >
   > Many thanks!  It indeed should check loop dependency for union region. 
   > 
   > https://gist.github.com/junrushao1994/943eb0e9232a21a2748eb35af0fac4ff
   > In the case it seems we get extent=128 for `A_shared[128, 4]` after fix, even without const bound estimation... Perhaps the loop dependent region parts get covered during region unions?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org