You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/01/19 16:45:25 UTC

[GitHub] [tvm] mbrookhart commented on pull request #7303: [TOPI] Make cumsum IR reusable, add thrust scan

mbrookhart commented on pull request #7303:
URL: https://github.com/apache/tvm/pull/7303#issuecomment-762969945


   Scan is probably the most hand-optimized kernel in thrust, I'm thrilled to be within 10x for a cross-GPU kernel. Overall I'm happy with this, but I have 2 thoughts.
   
   1. Should we add the TIR inclusive scan back in? I have that on a branch from my first implementation of get_valid_counts: https://github.com/mbrookhart/tvm/commit/944ee3c62d3176e86d555c85097c45c88d082204
   2. We should probably generalize for rank, I think maybe we can use the same kind of before/after trick used in sort: https://github.com/apache/tvm/blob/f91b51d638874973a2d9ccbcb4d49cf7c668f516/python/tvm/topi/cuda/sort.py#L69-L85


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org