You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/09/29 19:21:40 UTC

[GitHub] [incubator-tvm] ANSHUMAN87 commented on pull request #6580: Faster sparse_dense on GPUs

ANSHUMAN87 commented on pull request #6580:
URL: https://github.com/apache/incubator-tvm/pull/6580#issuecomment-700929567


   > I've written a faster sparse_dense for GPUs using tir. This sparse_dense requires a padded matrix, so I've added a new op sparse_dense_padded. AlterOpLayout should transform sparse_dense to sparse_dense_padded when using a gpu.
   > 
   > This new sparse_dense improves prunebert performance from 155.41ms mean to 7.75ms mean. In general, this implementation is faster than cublas dense on matrices with density < 0.05 and is often faster than cusparse for machine learning workloads.
   
   @tkonolige : Thanks for the PR! The data looks quite impressive :+1: 
   I was wondering whether we can add some sort of benchmark testcase here , tuned to your shared data?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org