You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/01/04 03:16:07 UTC

[GitHub] [tvm] masahi opened a new pull request #7195: [THRUST] Faster multi dimensional argsort by segmented sort

masahi opened a new pull request #7195:
URL: https://github.com/apache/tvm/pull/7195


   Current implementation of thrust argsort, when given multi dimensional inputs to sort along the inner most axis, is very inefficient: it does `n_iter` calls to thrust sort. See
   
   https://github.com/apache/tvm/blob/bad149ed8a555444d813537608ee5cea9e95e97e/src/runtime/contrib/thrust/thrust.cu#L50-L65
   
   When the outer dimension is large, the performance of thrust argsort is far from optimal. In particular, the thrust numbers in shown in https://github.com/apache/tvm/pull/7099 do not reflect the true performance thrust can achieve. 
   
   This PR replaces `n_iter` calls to thrust argsort with one segmented sort by key. Since thrust doesn't provide API to do sort by key, I used a neat back-to-back stable-sort-by-key trick explained in https://groups.google.com/forum/#!topic/thrust-users/BoLsxO6b4FY. My implementation is a bit more complicated because we need to do segmented sort **by key**, not just segmented sort. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] mbrookhart commented on pull request #7195: [THRUST] Faster multi dimensional argsort by segmented sort

Posted by GitBox <gi...@apache.org>.
mbrookhart commented on pull request #7195:
URL: https://github.com/apache/tvm/pull/7195#issuecomment-754096681


   Also, I think you and I are using different versions of CUDA for the same GPU, that might explain the difference in the numbers I posted in #7099 and you posted here.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] masahi commented on pull request #7195: [THRUST] Faster multi dimensional argsort by segmented sort

Posted by GitBox <gi...@apache.org>.
masahi commented on pull request #7195:
URL: https://github.com/apache/tvm/pull/7195#issuecomment-758343079


   @Laurawly @kazum This is ready to go, please have a look


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] mbrookhart commented on pull request #7195: [THRUST] Faster multi dimensional argsort by segmented sort

Posted by GitBox <gi...@apache.org>.
mbrookhart commented on pull request #7195:
URL: https://github.com/apache/tvm/pull/7195#issuecomment-754095011


   This looks great. My only concern would possibly be that some object detection models (I'm thinking gluon SSD) have a very large number of boxes they sort before NMS. Could you add shapes (1, 1e5) and (1, 1e6) to your test? I expect my mergesort will fail badly, but I wonder what the difference between your implementation and the current thrust implementation will be.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] masahi commented on pull request #7195: [THRUST] Faster multi dimensional argsort by segmented sort

Posted by GitBox <gi...@apache.org>.
masahi commented on pull request #7195:
URL: https://github.com/apache/tvm/pull/7195#issuecomment-754199021


   @mbrookhart I have a fast-path for one segment case, so the perf is the same between current / new. I'll update the condition to work for dimension other then two.
   https://github.com/apache/tvm/blob/26254f522de531569441eac4fecb45885fcdc30a/src/runtime/contrib/thrust/thrust.cu#L57
   
   @trevor-m Yes I briefly looked at cub's segmented sort. My impression is that it launches one thread block per segment. This sounds great when there are many segments to sort and each of segment is not so big. I'm not sure if that is a good fit for our use case - I think we are more likely to sort a few, but large segments, and most likely we only have one segment. I'm actually surprised to hear that TRT uses cub's segmented sort.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] trevor-m commented on pull request #7195: [THRUST] Faster multi dimensional argsort by segmented sort

Posted by GitBox <gi...@apache.org>.
trevor-m commented on pull request #7195:
URL: https://github.com/apache/tvm/pull/7195#issuecomment-754151980


   Nice! Have you also looked at CUB's [DeviceSegmentedRadixSort::SortPairsDescending](https://nvlabs.github.io/cub/structcub_1_1_device_segmented_radix_sort.html#abbba6639f928bebd19435f185ea10618) ? It sounds like it is exactly what you need with no tricks required. It's used by some fast NMS implementations such as [TensorRT](https://github.com/NVIDIA/TensorRT/tree/master/plugin/batchedNMSPlugin).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] masahi edited a comment on pull request #7195: [THRUST] Faster multi dimensional argsort by segmented sort

Posted by GitBox <gi...@apache.org>.
masahi edited a comment on pull request #7195:
URL: https://github.com/apache/tvm/pull/7195#issuecomment-754199021


   @mbrookhart I have a fast-path for one segment case, so the perf is the same between current / new. I'll update the condition to work for dimension other than two.
   https://github.com/apache/tvm/blob/26254f522de531569441eac4fecb45885fcdc30a/src/runtime/contrib/thrust/thrust.cu#L57
   
   @trevor-m Yes I briefly looked at cub's segmented sort. My impression is that it launches one thread block per segment. This sounds great when there are many segments to sort and each of segment is not so big. I'm not sure if that is a good fit for our use case - I think we are more likely to sort a few, but large segments, and most likely we only have one segment. I'm actually surprised to hear that TRT uses cub's segmented sort.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] masahi commented on pull request #7195: [THRUST] Faster multi dimensional argsort by segmented sort

Posted by GitBox <gi...@apache.org>.
masahi commented on pull request #7195:
URL: https://github.com/apache/tvm/pull/7195#issuecomment-759241934


   Thanks @mbrookhart @trevor-m 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] masahi merged pull request #7195: [THRUST] Faster multi dimensional argsort by segmented sort

Posted by GitBox <gi...@apache.org>.
masahi merged pull request #7195:
URL: https://github.com/apache/tvm/pull/7195


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org