You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by me...@apache.org on 2015/07/07 17:59:57 UTC
spark git commit: [SPARK-8823] [MLLIB] [PYSPARK] Optimizations for
SparseVector dot products
Repository: spark
Updated Branches:
refs/heads/master 1dbc4a155 -> 738c10748
[SPARK-8823] [MLLIB] [PYSPARK] Optimizations for SparseVector dot products
Follow up for https://github.com/apache/spark/pull/5946
Currently we iterate over indices and values in SparseVector and can be vectorized.
Author: MechCoder <ma...@gmail.com>
Closes #7222 from MechCoder/sparse_optim and squashes the following commits:
dcb51d3 [MechCoder] [SPARK-8823] [MLlib] [PySpark] Optimizations for SparseVector dot product
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/738c1074
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/738c1074
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/738c1074
Branch: refs/heads/master
Commit: 738c10748b49eb8a475d1fd26c6a271ca36497cf
Parents: 1dbc4a1
Author: MechCoder <ma...@gmail.com>
Authored: Tue Jul 7 08:59:52 2015 -0700
Committer: Xiangrui Meng <me...@databricks.com>
Committed: Tue Jul 7 08:59:52 2015 -0700
----------------------------------------------------------------------
python/pyspark/mllib/linalg.py | 20 ++++++++------------
1 file changed, 8 insertions(+), 12 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/738c1074/python/pyspark/mllib/linalg.py
----------------------------------------------------------------------
diff --git a/python/pyspark/mllib/linalg.py b/python/pyspark/mllib/linalg.py
index 9959a01..12d8dbb 100644
--- a/python/pyspark/mllib/linalg.py
+++ b/python/pyspark/mllib/linalg.py
@@ -590,18 +590,14 @@ class SparseVector(Vector):
return np.dot(other.array[self.indices], self.values)
elif isinstance(other, SparseVector):
- result = 0.0
- i, j = 0, 0
- while i < len(self.indices) and j < len(other.indices):
- if self.indices[i] == other.indices[j]:
- result += self.values[i] * other.values[j]
- i += 1
- j += 1
- elif self.indices[i] < other.indices[j]:
- i += 1
- else:
- j += 1
- return result
+ # Find out common indices.
+ self_cmind = np.in1d(self.indices, other.indices, assume_unique=True)
+ self_values = self.values[self_cmind]
+ if self_values.size == 0:
+ return 0.0
+ else:
+ other_cmind = np.in1d(other.indices, self.indices, assume_unique=True)
+ return np.dot(self_values, other.values[other_cmind])
else:
return self.dot(_convert_to_vector(other))
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org