You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2017/08/16 18:06:00 UTC

[jira] [Resolved] (SPARK-21680) ML/MLLIB Vector compressed optimization

     [ https://issues.apache.org/jira/browse/SPARK-21680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-21680.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 2.3.0

Issue resolved by pull request 18899
[https://github.com/apache/spark/pull/18899]

> ML/MLLIB Vector compressed optimization
> ---------------------------------------
>
>                 Key: SPARK-21680
>                 URL: https://issues.apache.org/jira/browse/SPARK-21680
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib
>    Affects Versions: 2.3.0
>            Reporter: Peng Meng
>             Fix For: 2.3.0
>
>
> When use Vector.compressed to change a Vector to SparseVector, the performance is very low comparing with Vector.toSparse.
> This is because you have to scan the value three times using Vector.compressed, but you just need two times when use Vector.toSparse.
> When the length of the vector is large, there is significant performance difference between this two method.
> Code of Vector compressed:
> {code:java}
>   def compressed: Vector = {
>     val nnz = numNonzeros
>     // A dense vector needs 8 * size + 8 bytes, while a sparse vector needs 12 * nnz + 20 bytes.
>     if (1.5 * (nnz + 1.0) < size) {
>       toSparse
>     } else {
>       toDense
>     }
>   }
> {code}
> I propose to change it to:
> {code:java}
> // Some comments here
> def compressed: Vector = {
>     val nnz = numNonzeros
>     // A dense vector needs 8 * size + 8 bytes, while a sparse vector needs 12 * nnz + 20 bytes.
>     if (1.5 * (nnz + 1.0) < size) {
>       val ii = new Array[Int](nnz)
>       val vv = new Array[Double](nnz)
>       var k = 0
>       foreachActive { (i, v) =>
>         if (v != 0) {
>           ii(k) = i
>           vv(k) = v
>         k += 1
>         }
>     }
>     new SparseVector(size, ii, vv)
>     } else {
>       toDense
>     }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org