You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2017/11/10 17:34:30 UTC

[GitHub] ZiyueHuang opened a new pull request #8611: optimization for dot(csr.T, dense) = rsp

ZiyueHuang opened a new pull request #8611: optimization for dot(csr.T, dense) = rsp
URL: https://github.com/apache/incubator-mxnet/pull/8611
 
 
   ## Description ##
   
   Use prefix sum to compute `nnr` in order to allocate the row_sparse output.
   
   Currently `dot(csr.T, dense) = rsp` will allocate the dense output and then cast it to row_sparse, but not free the unused memory.
   
   I use `run_benchmark(context, lhs="csr", rhs="default", lhs_trans=True, ...)` in `mxnet/benchmark/python/sparse/dot.py`. Please correct me if I'm wrong.
   
   But is `dot(csr.T, dense) = rsp` in master slow like this? Might due to others are using my machine at the same time?
   
   Performance of origin `dot(csr.T, dense) = rsp`,
   
   ```
   [hanfeng@model-gpu00:sparse]$ python dot.py --num-omp-threads 20
   ========================================================
     mxnet sparse dot benchmark: dot(csr, default) = default
     (matrix multiplication: (m x k)^T * (k x n) = m x n)
   ========================================================
    lhs_density(%)  rhs_density(%)    context        m        k        n  t_sparse(ms)   t_dense(ms)  speedup
               1.0           100.0     cpu(0)      128  1000000      256        366.19        135.76     0.37
               1.0           100.0     cpu(0)      128  1000000     1000       1327.12        503.92     0.38
               1.0           100.0     cpu(0)      128  1000000     1000       1237.33        454.01     0.37
               1.0           100.0     cpu(0)       64  1000000     1000        868.38        345.38     0.40
               1.0           100.0     cpu(0)      128  1000000     1000       1237.09        437.32     0.35
   ```
   
   After this PR,
   ```
   [hanfeng@model-gpu00:sparse]$ python dot.py --num-omp-threads 20
   ========================================================
     mxnet sparse dot benchmark: dot(csr, default) = default
     (matrix multiplication: (m x k)^T * (k x n) = m x n)
   ========================================================
    lhs_density(%)  rhs_density(%)    context        m        k        n  t_sparse(ms)   t_dense(ms)  speedup
               1.0           100.0     cpu(0)      128  1000000      256         83.90        137.18     1.64
               1.0           100.0     cpu(0)      128  1000000     1000        410.63        448.30     1.09
               1.0           100.0     cpu(0)      128  1000000     1000        467.91        492.87     1.05
               1.0           100.0     cpu(0)       64  1000000     1000        259.99        348.32     1.34
               1.0           100.0     cpu(0)      128  1000000     1000        481.77        416.20     0.86
   ```
   cc @eric-haibin-lin 
   
   ## Checklist ##
   ### Essentials ###
   - [x] Passed code style checking (`make lint`)
   - [x] Changes are complete (i.e. I finished coding on this PR)
   - [x] All changes have test coverage
   - [x] For user-facing API changes, API doc string has been updated.
   - [x] To my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
   
   ### Changes ###
   - [x] unittests already exist
   
   ## Comments ##
   - If this change is a backward incompatible change, why must this change be made.
   - Intersting edge cases to note here
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services