You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemds.apache.org by GitBox <gi...@apache.org> on 2020/07/14 09:39:35 UTC

[GitHub] [systemds] Baunsgaard commented on pull request #931: [SYSTEMDS-371-372][WIP] ColGroup Quantization

Baunsgaard commented on pull request #931:
URL: https://github.com/apache/systemds/pull/931#issuecomment-658081701


   @mboehm7 
   
   As requested here are some comparison between before and now after, also with this i will finish committing to this branch, to enable reviews.
   
   I have disabled two key features, that hopefully will improve performance once re-implemented, but i intend to slightly change the way they are done.
   
   - Dictionary sharing (I intend to enable sharing across different col-group types, Since we now have a shared representation for this) I intend to move this step to before the construction of ColGroups, this will enable the storing of pointers to all the dictionaries in the CompressedMatrixBlock object to quicken value only computations, and the  ColGroups will then be oblivious to their sharing of dictionaries.
   - CoCoding. This is disabled currently since 1 it increase compression time, 2 it does not improve compression ratio on covType dataset.
   
   Before (on master branch)
   ```code
   DATA      , RUN                      , TYPE                , TIME ms   , REP  
   covtype   , MatrixVector mv          , cla                 ,     1.980 ,   100
   covtype   , MatrixVector vm          , cla                 ,     3.310 ,   100
   covtype   , scalar mult              , cla                 ,     3.900 ,   100
   covtype   , scalar plus              , cla                 ,    13.180 ,   100
   covtype   , unaryAggregate sum       , cla                 ,     1.992 ,   500
   covtype   , unaryAggregate rowsum    , cla                 ,    23.740 ,   500
   covtype   , unaryAggregate colsum    , cla                 ,    24.556 ,   500
   covtype   , unaryAggregate colmax    , cla                 ,     0.122 ,   500
   covtype   , unaryAggregate max       , cla                 ,       nan ,     0
   covtype   , unaryAggregate min       , cla                 ,     0.100 ,   500
   covtype   , unaryAggregate rowmax    , cla                 ,    44.208 ,   500
   ```
   
   after:
   ```code
   DATA      , RUN                      , TYPE                , TIME ms   , REP  
   covtype   , MatrixVector mv          , cla                 ,     1.916 ,  1000
   covtype   , MatrixVector mv          , lcla                ,     1.752 ,  1000
   covtype   , MatrixVector vm          , cla                 ,     4.138 ,  1000
   covtype   , MatrixVector vm          , lcla                ,     3.764 ,  1000
   covtype   , scalar mult              , cla                 ,     0.157 ,  1000
   covtype   , scalar mult              , lcla                ,     0.129 ,  1000
   covtype   , scalar plus              , cla                 ,     0.249 ,  1000
   covtype   , scalar plus              , lcla                ,     0.212 ,  1000
   covtype   , unaryAggregate sum       , cla                 ,     0.828 ,   500
   covtype   , unaryAggregate sum       , lcla                ,     2.790 ,   500
   covtype   , unaryAggregate rowsum    , cla                 ,    12.075 ,  3000
   covtype   , unaryAggregate rowsum    , lcla                ,    33.120 ,  3000
   covtype   , unaryAggregate colsum    , cla                 ,     0.834 ,   500
   covtype   , unaryAggregate colsum    , lcla                ,     2.886 ,   500
   covtype   , unaryAggregate colmax    , cla                 ,     0.259 ,  3000
   covtype   , unaryAggregate colmax    , lcla                ,     0.039 ,  3000
   covtype   , unaryAggregate max       , cla                 ,     0.142 ,   500
   covtype   , unaryAggregate max       , lcla                ,     0.064 ,   500
   covtype   , unaryAggregate min       , cla                 ,     0.170 ,   500
   covtype   , unaryAggregate min       , lcla                ,     0.118 ,   500
   covtype   , unaryAggregate rowmax    , cla                 ,    31.253 ,  3000
   covtype   , unaryAggregate rowmax    , lcla                ,    69.297 ,  3000
   ```
   
   Uncompressed Performance:
   
   ```code
   covtype   , MatrixVector mv          , ula                 ,     6.230 ,  1000
   covtype   , MatrixVector vm          , ula                 ,     8.895 ,  1000
   covtype   , scalar mult              , ula                 ,    34.050 ,   300
   covtype   , scalar plus              , ula                 ,    63.683 ,   300
   covtype   , unaryAggregate sum       , ula                 ,     7.146 ,   500
   covtype   , unaryAggregate rowsum    , ula                 ,    10.895 ,  3000
   covtype   , unaryAggregate colsum    , ula                 ,     8.268 ,   500
   covtype   , unaryAggregate colmax    , ula                 ,     7.886 ,  3000
   covtype   , unaryAggregate max       , ula                 ,     7.116 ,   500
   covtype   , unaryAggregate min       , ula                 ,     7.508 ,   500
   covtype   , unaryAggregate rowmax    , ula                 ,     8.403 ,  3000
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org