You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/01/18 07:10:19 UTC

[GitHub] [tvm] masahi opened a new pull request #7303: [TOPI] Make cumsum IR reusable, add thrust scan

masahi opened a new pull request #7303:
URL: https://github.com/apache/tvm/pull/7303


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #7303: [TOPI] Make cumsum IR reusable, add thrust scan

Posted by GitBox <gi...@apache.org>.

mbrookhart commented on pull request #7303:
URL: https://github.com/apache/tvm/pull/7303#issuecomment-762969945


   Scan is probably the most hand-optimized kernel in thrust, I'm thrilled to be within 10x for a cross-GPU kernel. Overall I'm happy with this, but I have 2 thoughts.
   
   1. Should we add the TIR inclusive scan back in? I have that on a branch from my first implementation of get_valid_counts: https://github.com/mbrookhart/tvm/commit/944ee3c62d3176e86d555c85097c45c88d082204
   2. We should probably generalize for rank, I think maybe we can use the same kind of before/after trick used in sort: https://github.com/apache/tvm/blob/f91b51d638874973a2d9ccbcb4d49cf7c668f516/python/tvm/topi/cuda/sort.py#L69-L85


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] anijain2305 commented on pull request #7303: [TOPI] Make cumsum IR reusable, add thrust scan

Posted by GitBox <gi...@apache.org>.

anijain2305 commented on pull request #7303:
URL: https://github.com/apache/tvm/pull/7303#issuecomment-763297967


   Thanks :) LGTM :)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #7303: [TOPI] Make cumsum IR reusable, add thrust scan

Posted by GitBox <gi...@apache.org>.

masahi commented on pull request #7303:
URL: https://github.com/apache/tvm/pull/7303#issuecomment-763264832


   > Once it is merged, I can try on my end with TF models as well.
   
   Perf improvement is not expected, since it only improves `get_valid_count` slightly if you use thrust scan instead of TIR scan. The purpose of this PR is to enable parallelization for other ops, that are difficult without it. `argwhere` is a perfect example that I'll demonstrate soon after this one.
   
   @anijain2305 The term you want to search for is "gpu stream compaction".


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #7303: [TOPI] Make cumsum IR reusable, add thrust scan

Posted by GitBox <gi...@apache.org>.

masahi commented on pull request #7303:
URL: https://github.com/apache/tvm/pull/7303#issuecomment-763257824


   hmm interesting, I've never created a test case with empty tensor, is that possible?
   
   Note that the IR is copied straight from https://github.com/apache/tvm/pull/7303, so the same guard against empty tensor is here.
   
   https://github.com/apache/tvm/blob/4e13a3f4a04300113e9332ef581859cb0a40a082/python/tvm/topi/cuda/scan.py#L59


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #7303: [TOPI] Make cumsum IR reusable, add thrust scan

Posted by GitBox <gi...@apache.org>.

masahi commented on pull request #7303:
URL: https://github.com/apache/tvm/pull/7303#issuecomment-763456381


   Thanks @mbrookhart @anijain2305 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #7303: [TOPI] Make cumsum IR reusable, add thrust scan

Posted by GitBox <gi...@apache.org>.

masahi commented on pull request #7303:
URL: https://github.com/apache/tvm/pull/7303#issuecomment-763291062


   @anijain2305 I added an empty tensor test in https://github.com/apache/tvm/pull/7303/commits/20afc3243a17f48084204855f498c7f9af1cad7a
   
   OpenCL seems to have a problem with 0 size buffer, but otherwise both TIR scan and thrust scan seem to have no issue. Please take a look.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #7303: [TOPI] Make cumsum IR reusable, add thrust scan

Posted by GitBox <gi...@apache.org>.

mbrookhart commented on pull request #7303:
URL: https://github.com/apache/tvm/pull/7303#issuecomment-763245797


   Yeah, scanning on the non-inner axis will have a cache locality performance hit, but I'm honestly not sure if that would be better or worse than the overhead from doing a pair of reshape/transpose ops. Reshape and transpose are heavily limited by memory bandwidth.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #7303: [TOPI] Make cumsum IR reusable, add thrust scan

Posted by GitBox <gi...@apache.org>.

masahi commented on pull request #7303:
URL: https://github.com/apache/tvm/pull/7303#issuecomment-763106769


   1. Right now, inclusive scan can be supported by `exclusive_scan(data) + data`. I think that is fine for now, given that our scan IR is far from stable and we don't want to maintain two IRs for the sake of removing the additional sum.
   
   2. Yes, we can definitely do that. But this PR is already not small and I want to keep the original IR as close as possible for this PR. There are other TODO items for scan (e.g. support other binary ops), so I hope we can address this problem in the future as well.
   
   A related discussion point: Do you expect scan performance on non-innermost axis to be slower than the innermost case? If that's the case (which I believe yes), I think supporting non innermost scan and other ranks by 
   ```
   reshape + transpose + innermost scan + reshape and transpose back 
   ```
   is a good solution. It is definitely preferred in terms of implementation simplicity, allowing scan implementation to focus on 1 or 2D + innermost axis.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] anijain2305 commented on pull request #7303: [TOPI] Make cumsum IR reusable, add thrust scan

Posted by GitBox <gi...@apache.org>.

anijain2305 commented on pull request #7303:
URL: https://github.com/apache/tvm/pull/7303#issuecomment-763263064


   Yes, I eyeballed the changes wrt to empty tensor and they looked good. So, I am happy to approve this PR. Once it is merged, I can try on my end with TF models as well.
   
   For empty tensor test case - https://github.com/apache/tvm/blob/main/tests/python/relay/test_any.py#L879


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi merged pull request #7303: [TOPI] Make cumsum IR reusable, add thrust scan

Posted by GitBox <gi...@apache.org>.

masahi merged pull request #7303:
URL: https://github.com/apache/tvm/pull/7303


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #7303: [TOPI] Make cumsum IR reusable, add thrust scan

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #7303:
URL: https://github.com/apache/tvm/pull/7303#issuecomment-763106769


   1. Right now, inclusive scan can be supported by `exclusive_scan(data) + data`. I think that is fine for now, given that our scan IR is far from stable and we don't want to maintain two IRs for the sake of removing the additional sum.
   
   2. Yes, we can definitely do that. But this PR is already not small and I want to keep the original IR as close as possible for this PR. There are other TODO items for scan (e.g. support other binary ops), so I hope we can address this problem in the future as well.
   
   A related discussion point: Do you expect scan performance on non-innermost axis to be slower than the innermost case? If that's the case (which I believe yes), I think supporting non innermost scan and other ranks by 
   ```
   reshape + transpose -> innermost scan -> reshape and transpose back 
   ```
   is a good solution. It is definitely preferred in terms of implementation simplicity, allowing scan implementation to focus on 1 or 2D + innermost axis.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #7303: [TOPI] Make cumsum IR reusable, add thrust scan

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #7303:
URL: https://github.com/apache/tvm/pull/7303#issuecomment-763291062


   @anijain2305 I added an empty tensor test in https://github.com/apache/tvm/pull/7303/commits/a6c740348282c2d13f22883e62c7c910b73ad8c2
   
   OpenCL seems to have a problem with 0 size buffer, but otherwise both TIR scan and thrust scan seem to have no issue. Please take a look.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org