You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2019/11/20 23:44:13 UTC

[GitHub] [incubator-tvm] Huyuwei commented on issue #4369: [Runtime] Add cusparse for sparse dense

Huyuwei commented on issue #4369: [Runtime] Add cusparse for sparse dense
URL: https://github.com/apache/incubator-tvm/pull/4369#issuecomment-556551928
 
 
   Hi @cylinbao, thanks for the work! This PR uses cusparse csrmm routine. There is another routine csrmm2 which should be faster. According to its document [here](https://docs.nvidia.com/cuda/cusparse/index.html#csrmm2): “If op(B)=B, csrmm2() is the same as csrmm(); The motivation of transpose(B) is to improve the memory access of matrix B. The computational pattern of A * transpose(B) with matrix B in column-major order is equivalent to A * B with matrix B in row-major order.”
   
   I have done some benchmarking on reddit graph and csrmm2 with B transposed is 4x faster than not transposed.
   
   feature length   | csrmm2 with dense matrix not transposed, time in ms    | csrmm2 with dense matrix transposed, time in ms
   :--    | :--   | :--
   32 | 55.85 | 12.26
   64 | 111.51 | 25.16
   128 | 222.94 | 51.62
   256 | 445.66 | 104.77
   512 | 891.34 | 209.62
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services