You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2017/11/06 15:52:58 UTC
[GitHub] spark pull request #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluato...
GitHub user mgaido91 opened a pull request:
https://github.com/apache/spark/pull/19676
[SPARK-14516][FOLLOWUP] Adding ClusteringEvaluator to examples
## What changes were proposed in this pull request?
In SPARK-14516 we have introduced ClusteringEvaluator, but we didn't put any reference in the documentation and the examples were still relying on the sum of squared errors to show a way to evaluate the clustering model.
The PR adds the ClusteringEvaluator in the examples.
## How was this patch tested?
Manual runs of the examples.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mgaido91/spark SPARK-14516_examples
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19676.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19676
----
commit 4c4f83e97d9bd2d8771452498581bf9ce43bd28d
Author: Marco Gaido <mg...@hortonworks.com>
Date: 2017-11-06T15:49:17Z
[SPARK-14516][FOLLOWUP] Adding ClusteringEvaluator to examples
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluator to ex...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19676
**[Test build #83500 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83500/testReport)** for PR 19676 at commit [`4c4f83e`](https://github.com/apache/spark/commit/4c4f83e97d9bd2d8771452498581bf9ce43bd28d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluator to ex...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/19676
sorry for pinging you, what do you think about adding `ClusteringEvaluator` to the examples @yanboliang ? Thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluator to ex...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19676
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluato...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/19676#discussion_r155928871
--- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java ---
@@ -51,9 +52,14 @@ public static void main(String[] args) {
KMeans kmeans = new KMeans().setK(2).setSeed(1L);
KMeansModel model = kmeans.fit(dataset);
- // Evaluate clustering by computing Within Set Sum of Squared Errors.
- double WSSSE = model.computeCost(dataset);
- System.out.println("Within Set Sum of Squared Errors = " + WSSSE);
+ // Make predictions
+ Dataset<Row> predictions = model.transform(dataset);
+
+ // Evaluate clustering by computing Silhouette score
+ ClusteringEvaluator evaluator = new ClusteringEvaluator();
+
+ double silhouette = evaluator.evaluate(predictions);
+ System.out.println("Silhouette with squared euclidean distance = " + silhouette);
--- End diff --
euclidean -> Euclidean, but not important to change unless you're touching the code again anyway
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluator to ex...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19676
**[Test build #84681 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84681/testReport)** for PR 19676 at commit [`feb619d`](https://github.com/apache/spark/commit/feb619d657f6ff66dec240ee4619e6f53208ac18).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluator to ex...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/19676
Merged to master
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluato...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/19676
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluato...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19676#discussion_r155929522
--- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java ---
@@ -51,9 +52,14 @@ public static void main(String[] args) {
KMeans kmeans = new KMeans().setK(2).setSeed(1L);
KMeansModel model = kmeans.fit(dataset);
- // Evaluate clustering by computing Within Set Sum of Squared Errors.
- double WSSSE = model.computeCost(dataset);
- System.out.println("Within Set Sum of Squared Errors = " + WSSSE);
+ // Make predictions
+ Dataset<Row> predictions = model.transform(dataset);
+
+ // Evaluate clustering by computing Silhouette score
+ ClusteringEvaluator evaluator = new ClusteringEvaluator();
+
+ double silhouette = evaluator.evaluate(predictions);
+ System.out.println("Silhouette with squared euclidean distance = " + silhouette);
--- End diff --
Thanks, I don't think I am changing the code again, but I can fix this grammatical error if you want.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluator to ex...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19676
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84681/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluator to ex...
Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/19676
It's good to have this, sorry for late response, I will make a pass tomorrow. Thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluator to ex...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19676
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluato...
Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/19676#discussion_r155913190
--- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java ---
@@ -51,9 +52,17 @@ public static void main(String[] args) {
KMeans kmeans = new KMeans().setK(2).setSeed(1L);
KMeansModel model = kmeans.fit(dataset);
- // Evaluate clustering by computing Within Set Sum of Squared Errors.
- double WSSSE = model.computeCost(dataset);
- System.out.println("Within Set Sum of Squared Errors = " + WSSSE);
+ // Make predictions
+ Dataset<Row> predictions = model.transform(dataset);
+
+ // Evaluate clustering by computing Silhouette score
+ ClusteringEvaluator evaluator = new ClusteringEvaluator()
+ .setFeaturesCol("features")
+ .setPredictionCol("prediction")
--- End diff --
We use default values here, so it's not necessary to set them explicitly. We should keep examples as simple as possible. Thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluator to ex...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19676
**[Test build #83500 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83500/testReport)** for PR 19676 at commit [`4c4f83e`](https://github.com/apache/spark/commit/4c4f83e97d9bd2d8771452498581bf9ce43bd28d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluator to ex...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19676
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83500/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluator to ex...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19676
**[Test build #84681 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84681/testReport)** for PR 19676 at commit [`feb619d`](https://github.com/apache/spark/commit/feb619d657f6ff66dec240ee4619e6f53208ac18).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org