You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2019/11/20 23:55:27 UTC

[GitHub] [incubator-tvm] yuluny2 commented on issue #4369: [Runtime] Add cusparse for sparse dense

yuluny2 commented on issue #4369: [Runtime] Add cusparse for sparse dense
URL: https://github.com/apache/incubator-tvm/pull/4369#issuecomment-556554623
 
 
   > Hi @cylinbao, thanks for the work! This PR uses cusparse csrmm routine. There is another routine csrmm2 which should be faster. According to its document [here](https://docs.nvidia.com/cuda/cusparse/index.html#csrmm2): “If op(B)=B, csrmm2() is the same as csrmm(); The motivation of transpose(B) is to improve the memory access of matrix B. The computational pattern of A * transpose(B) with matrix B in column-major order is equivalent to A * B with matrix B in row-major order.”
   > 
   > I have done some benchmarking on reddit graph and csrmm2 with B transposed is 4x faster than not transposed.
   > 
   > feature length	csrmm2 with dense matrix not transposed, time in ms	csrmm2 with dense matrix transposed, time in ms
   > 32	55.85	12.26
   > 64	111.51	25.16
   > 128	222.94	51.62
   > 256	445.66	104.77
   > 512	891.34	209.62
   
   It says it's deprecated and will be removed on the next release. Maybe they would speedup the current one?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services