You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by jk...@apache.org on 2016/01/14 03:01:34 UTC
spark git commit: [SPARK-12703][MLLIB][DOC][PYTHON] Fixed
pyspark.mllib.clustering.KMeans user guide example
Repository: spark
Updated Branches:
refs/heads/master 021dafc6a -> 20d8ef858
[SPARK-12703][MLLIB][DOC][PYTHON] Fixed pyspark.mllib.clustering.KMeans user guide example
Fixed WSSSE computeCost in Python mllib KMeans user guide example by using new computeCost method API in Python.
Author: Joseph K. Bradley <jo...@databricks.com>
Closes #10707 from jkbradley/kmeans-doc-fix.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/20d8ef85
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/20d8ef85
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/20d8ef85
Branch: refs/heads/master
Commit: 20d8ef858af6e13db59df118b562ea33cba5464d
Parents: 021dafc
Author: Joseph K. Bradley <jo...@databricks.com>
Authored: Wed Jan 13 18:01:29 2016 -0800
Committer: Joseph K. Bradley <jo...@databricks.com>
Committed: Wed Jan 13 18:01:29 2016 -0800
----------------------------------------------------------------------
docs/mllib-clustering.md | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/20d8ef85/docs/mllib-clustering.md
----------------------------------------------------------------------
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md
index 93cd0c1..d0be032 100644
--- a/docs/mllib-clustering.md
+++ b/docs/mllib-clustering.md
@@ -152,11 +152,7 @@ clusters = KMeans.train(parsedData, 2, maxIterations=10,
runs=10, initializationMode="random")
# Evaluate clustering by computing Within Set Sum of Squared Errors
-def error(point):
- center = clusters.centers[clusters.predict(point)]
- return sqrt(sum([x**2 for x in (point - center)]))
-
-WSSSE = parsedData.map(lambda point: error(point)).reduce(lambda x, y: x + y)
+WSSSE = clusters.computeCost(parsedData)
print("Within Set Sum of Squared Error = " + str(WSSSE))
# Save and load model
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org