You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2019/11/20 23:55:27 UTC
[GitHub] [incubator-tvm] yuluny2 commented on issue #4369: [Runtime] Add
cusparse for sparse dense
yuluny2 commented on issue #4369: [Runtime] Add cusparse for sparse dense
URL: https://github.com/apache/incubator-tvm/pull/4369#issuecomment-556554623
> Hi @cylinbao, thanks for the work! This PR uses cusparse csrmm routine. There is another routine csrmm2 which should be faster. According to its document [here](https://docs.nvidia.com/cuda/cusparse/index.html#csrmm2): “If op(B)=B, csrmm2() is the same as csrmm(); The motivation of transpose(B) is to improve the memory access of matrix B. The computational pattern of A * transpose(B) with matrix B in column-major order is equivalent to A * B with matrix B in row-major order.”
>
> I have done some benchmarking on reddit graph and csrmm2 with B transposed is 4x faster than not transposed.
>
> feature length csrmm2 with dense matrix not transposed, time in ms csrmm2 with dense matrix transposed, time in ms
> 32 55.85 12.26
> 64 111.51 25.16
> 128 222.94 51.62
> 256 445.66 104.77
> 512 891.34 209.62
It says it's deprecated and will be removed on the next release. Maybe they would speedup the current one?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services