You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Pasquinell Urbani <pa...@exalitica.com> on 2016/06/23 13:12:52 UTC

Change from distributed.MatrixEntry to Vector

Hello all,

I have to build a item-based recommendation system. First I obtained the
similarity matrix with CosineSimilarity DIMSUM by twitter solution (
https://blog.twitter.com/2014/all-pairs-similarity-via-dimsum). The
similarity matrix is in the following format:
org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.distributed.MatrixEntry].
The matrix name is simsEstimate, and is obtained from the following code:


    val R = sc.parallelize(0 until M, NUMCHUNKS).flatMap{i =>
      val inds = new scala.collection.mutable.TreeSet[Int]()
      while (inds.size < NNZ) {
        inds += scala.util.Random.nextInt(U)
      }
      inds.toArray.map(j => MatrixEntry(i, j, scala.math.random))
    }
    val mat = new CoordinateMatrix(R, M, U).toRowMatrix()

    val simsEstimate = mat.columnSimilarities(0.8)


 After this, I need to perform a ElementwiseProduct involving the columns
of the similarity matrix. But it needs to be
in org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] format.


Can anybody tell me how to manipulate a MatrixEntry format in order to
obtain their component vector to be in org.apache.spark.rdd.RDD[org.
apache.spark.mllib.linalg.Vector] format?