You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2016/03/17 11:15:33 UTC

[jira] [Assigned] (SPARK-13970) Add Non-Negative Matrix Factorization to MLlib

     [ https://issues.apache.org/jira/browse/SPARK-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-13970:
------------------------------------

    Assignee: Apache Spark

> Add Non-Negative Matrix Factorization to MLlib
> ----------------------------------------------
>
>                 Key: SPARK-13970
>                 URL: https://issues.apache.org/jira/browse/SPARK-13970
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: zhengruifeng
>            Assignee: Apache Spark
>            Priority: Minor
>
> NMF is to find two non-negative matrices (W, H) whose product W * H.T approximates the non-negative matrix X. This factorization can be used for example for dimensionality reduction, source separation or topic extraction.
> NMF was implemented in several packages:
> Scikit-Learn (http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html#sklearn.decomposition.NMF)
> R-NMF (https://cran.r-project.org/web/packages/NMF/index.html)
> LibNMF (http://www.univie.ac.at/rlcta/software/)
> I have implemented in MLlib according to the following papers:
> Distributed Nonnegative Matrix Factorization for Web-Scale Dyadic Data Analysis on MapReduce (http://research.microsoft.com/pubs/119077/DNMF.pdf)
> Algorithms for Non-negative Matrix Factorization (http://papers.nips.cc/paper/1861-algorithms-for-non-negative-matrix-factorization.pdf)
> It can be used like this:
> val m = 4
> val n = 3
> val data = Seq(
>     (0L, Vectors.dense(0.0, 1.0, 2.0)),
>     (1L, Vectors.dense(3.0, 4.0, 5.0)),
>     (3L, Vectors.dense(9.0, 0.0, 1.0))
>   ).map(x => IndexedRow(x._1, x._2))
> val A = new IndexedRowMatrix(indexedRows).toCoordinateMatrix()
> val k = 2
> // run the nmf algo
> val r = NMF.solve(A, k, 10)
> val rW = r.W.toBlockMatrix().toLocalMatrix().asInstanceOf[DenseMatrix]
> >>> org.apache.spark.mllib.linalg.DenseMatrix =
> 1.1349295096806706  1.4423101890626953E-5
> 3.453054133110303   0.46312492493865615
> 0.0                 0.0
> 0.3133764134585149  2.70684017255672
> val rH = r.H.toBlockMatrix().toLocalMatrix().asInstanceOf[DenseMatrix]
> >>> org.apache.spark.mllib.linalg.DenseMatrix =
> 0.4184163313845057  3.2719352525149286
> 1.12188012613645    0.002939823716977737
> 1.456499371939653   0.18992996116069297
> val R = rW.multiply(rH.transpose)
> >>> org.apache.spark.mllib.linalg.DenseMatrix =
> 0.4749202332761286  1.273254903877907    1.6530268574248572
> 2.9601290106732367  3.8752743120480346   5.117332475154927
> 0.0                 0.0                  0.0
> 8.987727592773672   0.35952840319637736  0.9705425982249293
> val AD = A.toBlockMatrix().toLocalMatrix()
> >>> org.apache.spark.mllib.linalg.Matrix =
> 0.0  1.0  2.0
> 3.0  4.0  5.0
> 0.0  0.0  0.0
> 9.0  0.0  1.0
> var loss = 0.0
> for(i <- 0 until AD.numRows; j <- 0 until AD.numCols) {
>    val diff = AD(i, j) - R(i, j)
>    loss += diff * diff
> }
> loss
> >>> Double = 0.5817999580912183



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org