You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@systemml.apache.org by "Matthias Boehm (JIRA)" <ji...@apache.org> on 2016/07/28 04:52:20 UTC

[jira] [Commented] (SYSTEMML-413) Runtime refactoring core matrix block library

    [ https://issues.apache.org/jira/browse/SYSTEMML-413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396904#comment-15396904 ] 

Matthias Boehm commented on SYSTEMML-413:
-----------------------------------------

[~freiss] that's a good start - a couple of additions:

1) Input/output: all the readers/writers are in 'org.apache.sysml.runtime.io' - similar to the new frame readers and writers, the existing sequential and parallel readers should be consolidated too. Casting functionality and conversion to/from external representations can be found in org.apache.sysml.runtime.util.DataConverter.
2) Operation libraries: Some of the performance-critical code is in our LibMatrix* classes. I would like to keep them, especially LibMatrixMult, LibMatrixDatagen, LibMatixReorg, LibMatrixBincell, and LibMatrixAgg isolated as they are already quite large in code size.
3) Frames: One thing to keep in mind is that the buffer pool and some other places are implemented in a generic manner against CacheBlocks with MatrixBlock and FrameBlock implementing this abstraction. Any refactoring would need to consider this.

> Runtime refactoring core matrix block library
> ---------------------------------------------
>
>                 Key: SYSTEMML-413
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-413
>             Project: SystemML
>          Issue Type: Task
>          Components: Runtime
>            Reporter: Matthias Boehm
>
> Pull the local (non-distributed) linear algebra components of SystemML into a separate package. Define a proper object-oriented Java API for creating and manipulating local matrices. Document this API. Refactor all tests of local linear algebra functionality so that those tests use the new API. Refactor the distributed linear algebra operators (both Spark and Hadoop map-reduce) to use the new APIs for local linear algebra. 
> *Overall Refactoring Plan*
> The MatrixBlock class will be the core locus of refactoring. The file is over 6000 lines long, has dependencies on the HOPS and LOPS layers, and contains a lot of sparse matrix code that really ought to be in SparseBlock. Even if it’s modified in place, MatrixBlock will bear little resemblance to its current form after the refactoring is completed. I recommend setting aside the current MatrixBlock class and creating new classes with equivalent functionality by copying appropriate blocks of code from the old class. 
> Major changes to make relative to MatrixBlock:
> * We should create a new DenseMatrixBlock class that only covers dense linear algebra.
> * Sparse-specific code should be moved into the SparseBlock class. 
> * Common functionality across dense and sparse should go into the MatrixValue superclass.
> * There should be a new class with a name like “Matrix” (we’ll need one anyway to serve as the public API) that contains a pointer to a MatrixValue and can switch between different representations. Ideally this class should be designed so that, in the future, it can serve as a matrix ADT that will wrap both local and distributed linear algebra.
> * Several fields (maxrow, maxcolumn, numGroups, and various estimates of future numbers of nonzeros) are used for stashing data that is only for internal SystemML use. Either put these into a different data structure or provide a generic mechanism for tagging a matrix block with additional application-specific data.
> * Clean up and simplify the multiple different initialization methods (different variants of the constructors and the methods init() and reset()). There should be one canonical method for each major type of initialization. Other methods that are shortcuts (i.e. reset() with no arguments) should call the canonical method internally.
> * Consider refactoring the variants of ternaryOperations() that support ctable() into something simpler that is called ctable() – perhaps a Java API that can take null values for the optional arguments. 
> Other changes outside MatrixBlock:
> * The matrix classes currently depend on Hadoop I/O classes like Writable and DataInputBuffer. A local linear algebra library really shoudn’t require Hadoop. I/O methods that use Hadoop APIs should be factored out into a separate package. In paticular, MatrixValue needs to be separated from Hadoop’s WritableComparable API.
> * The contents of the following packages need to move to the new library: sysml.runtime.functionobjects and sysml.runtime.matrix.operators
> * The library will need local input and output functions. I haven’t found suitable functions yet, but they may be hidden somewhere; in that case the existing functions should be adjacent to the other local linear algebra code.
> * Utility functions under classes in sysml.runtime.util will need to be replicated.
> * The more obscure subclasses of MatrixValue (MatrixCell, WeightedCell, etc.) do NOT need to be moved over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)