You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Devi P.V" <de...@gmail.com> on 2016/01/13 19:16:07 UTC

Optimized way to multiply two large matrices and save output using Spark and Scala

I want to multiply two large matrices (from csv files)using Spark and Scala
and save output.I use the following code

  val rows=file1.coalesce(1,false).map(x=>{
      val line=x.split(delimiter).map(_.toDouble)
      Vectors.sparse(line.length,
        line.zipWithIndex.map(e => (e._2, e._1)).filter(_._2 != 0.0))

    })

    val rmat = new RowMatrix(rows)

    val dm=file2.coalesce(1,false).map(x=>{
      val line=x.split(delimiter).map(_.toDouble)
      Vectors.dense(line)
    })

    val ma = dm.map(_.toArray).take(dm.count.toInt)
    val localMat = Matrices.dense( dm.count.toInt,
      dm.take(1)(0).size,

      transpose(ma).flatten)

    // Multiply two matrices
    val s=rmat.multiply(localMat).rows

    s.map(x=>x.toArray.mkString(delimiter)).saveAsTextFile(OutputPath)

  }

  def transpose(m: Array[Array[Double]]): Array[Array[Double]] = {
    (for {
      c <- m(0).indices
    } yield m.map(_(c)) ).toArray
  }

When I save file it takes more time and output file has very large in
size.what is the optimized way to multiply two large files and save the
output to a text file ?

Re: Optimized way to multiply two large matrices and save output using Spark and Scala

Posted by Burak Yavuz <br...@gmail.com>.
BlockMatrix.multiply is the suggested method of multiplying two large
matrices. Is there a reason that you didn't use BlockMatrices?

You can load the matrices and convert to and from RowMatrix. If it's in
sparse format (i, j, v), then you can also use the CoordinateMatrix to
load, BlockMatrix to multiply, and CoordinateMatrix to save it back again.

Thanks,
Burak

On Wed, Jan 13, 2016 at 8:16 PM, Devi P.V <de...@gmail.com> wrote:

> I want to multiply two large matrices (from csv files)using Spark and
> Scala and save output.I use the following code
>
>   val rows=file1.coalesce(1,false).map(x=>{
>       val line=x.split(delimiter).map(_.toDouble)
>       Vectors.sparse(line.length,
>         line.zipWithIndex.map(e => (e._2, e._1)).filter(_._2 != 0.0))
>
>     })
>
>     val rmat = new RowMatrix(rows)
>
>     val dm=file2.coalesce(1,false).map(x=>{
>       val line=x.split(delimiter).map(_.toDouble)
>       Vectors.dense(line)
>     })
>
>     val ma = dm.map(_.toArray).take(dm.count.toInt)
>     val localMat = Matrices.dense( dm.count.toInt,
>       dm.take(1)(0).size,
>
>       transpose(ma).flatten)
>
>     // Multiply two matrices
>     val s=rmat.multiply(localMat).rows
>
>     s.map(x=>x.toArray.mkString(delimiter)).saveAsTextFile(OutputPath)
>
>   }
>
>   def transpose(m: Array[Array[Double]]): Array[Array[Double]] = {
>     (for {
>       c <- m(0).indices
>     } yield m.map(_(c)) ).toArray
>   }
>
> When I save file it takes more time and output file has very large in
> size.what is the optimized way to multiply two large files and save the
> output to a text file ?
>