You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2017/08/16 18:06:00 UTC
[jira] [Resolved] (SPARK-21680) ML/MLLIB Vector compressed
optimization
[ https://issues.apache.org/jira/browse/SPARK-21680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-21680.
-------------------------------
Resolution: Fixed
Fix Version/s: 2.3.0
Issue resolved by pull request 18899
[https://github.com/apache/spark/pull/18899]
> ML/MLLIB Vector compressed optimization
> ---------------------------------------
>
> Key: SPARK-21680
> URL: https://issues.apache.org/jira/browse/SPARK-21680
> Project: Spark
> Issue Type: Improvement
> Components: ML, MLlib
> Affects Versions: 2.3.0
> Reporter: Peng Meng
> Fix For: 2.3.0
>
>
> When use Vector.compressed to change a Vector to SparseVector, the performance is very low comparing with Vector.toSparse.
> This is because you have to scan the value three times using Vector.compressed, but you just need two times when use Vector.toSparse.
> When the length of the vector is large, there is significant performance difference between this two method.
> Code of Vector compressed:
> {code:java}
> def compressed: Vector = {
> val nnz = numNonzeros
> // A dense vector needs 8 * size + 8 bytes, while a sparse vector needs 12 * nnz + 20 bytes.
> if (1.5 * (nnz + 1.0) < size) {
> toSparse
> } else {
> toDense
> }
> }
> {code}
> I propose to change it to:
> {code:java}
> // Some comments here
> def compressed: Vector = {
> val nnz = numNonzeros
> // A dense vector needs 8 * size + 8 bytes, while a sparse vector needs 12 * nnz + 20 bytes.
> if (1.5 * (nnz + 1.0) < size) {
> val ii = new Array[Int](nnz)
> val vv = new Array[Double](nnz)
> var k = 0
> foreachActive { (i, v) =>
> if (v != 0) {
> ii(k) = i
> vv(k) = v
> k += 1
> }
> }
> new SparseVector(size, ii, vv)
> } else {
> toDense
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org