You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemml.apache.org by GitBox <gi...@apache.org> on 2020/03/29 09:31:31 UTC

[GitHub] [systemml] Baunsgaard opened a new pull request #872: [WIP][SYSTEMDS-273] Refactor Compressed Package

Baunsgaard opened a new pull request #872: [WIP][SYSTEMDS-273] Refactor Compressed Package
URL: https://github.com/apache/systemml/pull/872
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] Baunsgaard commented on issue #872: [SYSTEMDS-273] Refactor Compressed Package

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on issue #872: [SYSTEMDS-273] Refactor Compressed Package
URL: https://github.com/apache/systemml/pull/872#issuecomment-615277499
 
 
   Furthermore there are some remaining issues in the estimation of joined colGroups, containing multiple columns that estimate to large sizes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] Baunsgaard commented on issue #872: [SYSTEMDS-273] Refactor Compressed Package

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on issue #872: [SYSTEMDS-273] Refactor Compressed Package
URL: https://github.com/apache/systemml/pull/872#issuecomment-615276665
 
 
   Major refactor of compressed Matrix Block to "simplify" responsibilities.
   Many changes in compression planning especially in memory estimation.
   
   Feedback appreciated :+1:
   
   The remaining errors, are mainly in the sparse estimation of compression sizes. But include
   
   - Sparse estimation of Number of Distinct values is off when the input is sparse the wrong sample based estimators are used.
   - The bitmaps encoded for extracting column facts does not contain information of if there is a 0 present in the column.
   - A bug (intend to fix today) in unary operators when compressing with a specific compression scheme.
   
   Hopefully if we merge the bugs mentioned above can be fixed within reasonable time.
   
   Bellow is an extract of the different changes:
   
   - Separated sub-parts of compression into different packages.
   - Array memory footprint worst case calculations.
   - Moved Compressed Size Estimation Calculation to specific ColGroups
   
   - Extensive testing of size Estimation of ColGroups and compression
     - Jol Memory Estimate tests for compression blocks
     - Using worst case Jol Estimate JVM using uncompressed 64-bit JVM
     - Ideal input generator for testing Col groups compression.
   
   - Factory pattern added for selected constructors
     - ColGroups
       - NameChange from ColGroupCompressor to ColGroupFactory
     - CompressedMatrixBlock
   
   - Enable the parallel execution of the ColGrouping
   
   - Settings File added for Compression to enable selection of specific
     compression types.
   
   - Added abstract compressed block for overwriting default MatrixBlock
   
   - Add Test Libs to pom.xml
     - Memory estimator framework JOL from OpenJDK to measure object sizes
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services