You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@spark.apache.org by jk...@apache.org on 2016/01/14 03:01:34 UTC

spark git commit: [SPARK-12703][MLLIB][DOC][PYTHON] Fixed pyspark.mllib.clustering.KMeans user guide example

Repository: spark
Updated Branches:
  refs/heads/master 021dafc6a -> 20d8ef858


[SPARK-12703][MLLIB][DOC][PYTHON] Fixed pyspark.mllib.clustering.KMeans user guide example

Fixed WSSSE computeCost in Python mllib KMeans user guide example by using new computeCost method API in Python.

Author: Joseph K. Bradley <jo...@databricks.com>

Closes #10707 from jkbradley/kmeans-doc-fix.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/20d8ef85
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/20d8ef85
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/20d8ef85

Branch: refs/heads/master
Commit: 20d8ef858af6e13db59df118b562ea33cba5464d
Parents: 021dafc
Author: Joseph K. Bradley <jo...@databricks.com>
Authored: Wed Jan 13 18:01:29 2016 -0800
Committer: Joseph K. Bradley <jo...@databricks.com>
Committed: Wed Jan 13 18:01:29 2016 -0800

----------------------------------------------------------------------
 docs/mllib-clustering.md | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/20d8ef85/docs/mllib-clustering.md
----------------------------------------------------------------------
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md
index 93cd0c1..d0be032 100644
--- a/docs/mllib-clustering.md
+++ b/docs/mllib-clustering.md
@@ -152,11 +152,7 @@ clusters = KMeans.train(parsedData, 2, maxIterations=10,
         runs=10, initializationMode="random")
 
 # Evaluate clustering by computing Within Set Sum of Squared Errors
-def error(point):
-    center = clusters.centers[clusters.predict(point)]
-    return sqrt(sum([x**2 for x in (point - center)]))
-
-WSSSE = parsedData.map(lambda point: error(point)).reduce(lambda x, y: x + y)
+WSSSE = clusters.computeCost(parsedData)
 print("Within Set Sum of Squared Error = " + str(WSSSE))
 
 # Save and load model


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org