You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2014/11/17 20:08:33 UTC

[jira] [Commented] (SPARK-4409) Additional (but limited) Linear Algebra Utils

    [ https://issues.apache.org/jira/browse/SPARK-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14214999#comment-14214999 ] 

Apache Spark commented on SPARK-4409:
-------------------------------------

User 'brkyvz' has created a pull request for this issue:
https://github.com/apache/spark/pull/3319

> Additional (but limited) Linear Algebra Utils
> ---------------------------------------------
>
>                 Key: SPARK-4409
>                 URL: https://issues.apache.org/jira/browse/SPARK-4409
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Burak Yavuz
>            Priority: Minor
>
> This ticket is to discuss the addition of a very limited number of local matrix manipulation and generation methods that would be helpful in the further development for algorithms on top of BlockMatrix (SPARK-3974), such as Randomized SVD, and Multi Model Training (SPARK-1486).
> The proposed methods for addition are:
> For `Matrix`
>  -  map: maps the values in the matrix with a given function. Produces a new matrix.
>  -  update: the values in the matrix are updated with a given function. Occurs in place.
> Factory methods for `DenseMatrix`:
>  -  *zeros: Generate a matrix consisting of zeros
>  -  *ones: Generate a matrix consisting of ones
>  -  *eye: Generate an identity matrix
>  -  *rand: Generate a matrix consisting of i.i.d. uniform random numbers
>  -  *randn: Generate a matrix consisting of i.i.d. gaussian random numbers
>  -  *diag: Generate a diagonal matrix from a supplied vector
> *These methods already exist in the factory methods for `Matrices`, however for cases where we require a `DenseMatrix`, you constantly have to add `.asInstanceOf[DenseMatrix]` everywhere, which makes the code "dirtier". I propose moving these functions to factory methods for `DenseMatrix` where the putput will be a `DenseMatrix` and the factory methods for `Matrices` will call these functions directly and output a generic `Matrix`.
> Factory methods for `SparseMatrix`:
>  -  speye: Identity matrix in sparse format. Saves a ton of memory when dimensions are large, especially in Multi Model Training, where each row requires being multiplied by a scalar.
>  -  sprand: Generate a sparse matrix with a given density consisting of i.i.d. uniform random numbers.
>  -  sprandn: Generate a sparse matrix with a given density consisting of i.i.d. gaussian random numbers.
>  -  diag: Generate a diagonal matrix from a supplied vector, but is memory efficient, because it just stores the diagonal. Again, very helpful in Multi Model Training.
> Factory methods for `Matrices`:
>  -  Include all the factory methods given above, but return a generic `Matrix` rather than `SparseMatrix` or `DenseMatrix`.
>  -  horzCat: Horizontally concatenate matrices to form one larger matrix. Very useful in both Multi Model Training, and for the repartitioning of BlockMatrix.
>  -  vertCat: Vertically concatenate matrices to form one larger matrix. Very useful for the repartitioning of BlockMatrix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org