You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2018/10/17 15:07:00 UTC
[GitHub] spark pull request #22756: [SPARK-25758][ML] Deprecate computeCost on Bisect...
GitHub user mgaido91 opened a pull request:
https://github.com/apache/spark/pull/22756
[SPARK-25758][ML] Deprecate computeCost on BisectingKMeans
## What changes were proposed in this pull request?
The PR proposes to deprecate the `computeCost` method on `BisectingKMeans` in favor of the adoption of `ClusteringEvaluator` in order to evaluate the clustering. Moreover, it introduces a `trainingCost` value in the `BisectingKMeansSummary` which exposes the same information computed on the training dataset.
## How was this patch tested?
improved UTs
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mgaido91/spark SPARK-25758
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22756.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22756
----
commit 476198910a001c8c58df3416a5b03aef6deb0882
Author: Marco Gaido <ma...@...>
Date: 2018-10-17T15:04:02Z
[SPARK-25758][ML] Deprecate computeCost on BisectingKMeans
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22756: [SPARK-25758][ML] Deprecate computeCost on Bisect...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/22756#discussion_r226023867
--- Diff: python/pyspark/ml/clustering.py ---
@@ -335,20 +335,6 @@ def clusterCenters(self):
"""Get the cluster centers, represented as a list of NumPy arrays."""
return [c.toArray() for c in self._call_java("clusterCenters")]
- @since("2.0.0")
- def computeCost(self, dataset):
--- End diff --
Hm, can you actually remove this, vs just deprecate it?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22756
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4065/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/22756
LGTM. thanks!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22756
**[Test build #97501 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97501/testReport)** for PR 22756 at commit [`ed235f2`](https://github.com/apache/spark/commit/ed235f2b5f67978e4d1f687ad5e920c34d843d5c).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22756
**[Test build #97495 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97495/testReport)** for PR 22756 at commit [`4761989`](https://github.com/apache/spark/commit/476198910a001c8c58df3416a5b03aef6deb0882).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22756: [SPARK-25758][ML] Deprecate computeCost on Bisect...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22756#discussion_r226030159
--- Diff: python/pyspark/ml/clustering.py ---
@@ -335,20 +335,6 @@ def clusterCenters(self):
"""Get the cluster centers, represented as a list of NumPy arrays."""
return [c.toArray() for c in self._call_java("clusterCenters")]
- @since("2.0.0")
- def computeCost(self, dataset):
--- End diff --
sorry, this was not intended, I am fixing this.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/22756
cc @holdenk @srowen
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22756: [SPARK-25758][ML] Deprecate computeCost on Bisect...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22756#discussion_r226177653
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
@@ -125,8 +125,13 @@ class BisectingKMeansModel private[ml] (
/**
* Computes the sum of squared distances between the input points and their corresponding cluster
* centers.
+ *
+ * @deprecated This method is deprecated and will be removed in 3.0.0. Use ClusteringEvaluator
+ * instead. You can also get the cost on the training dataset in the summary.
*/
@Since("2.0.0")
+ @deprecated("This method is deprecated and will be removed in 3.0.0. Use ClusteringEvaluator " +
--- End diff --
Thank you for the decision, @cloud-fan !
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22756
**[Test build #97495 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97495/testReport)** for PR 22756 at commit [`4761989`](https://github.com/apache/spark/commit/476198910a001c8c58df3416a5b03aef6deb0882).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22756: [SPARK-25758][ML] Deprecate computeCost on Bisect...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22756#discussion_r226141315
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
@@ -125,8 +125,13 @@ class BisectingKMeansModel private[ml] (
/**
* Computes the sum of squared distances between the input points and their corresponding cluster
* centers.
+ *
+ * @deprecated This method is deprecated and will be removed in 3.0.0. Use ClusteringEvaluator
+ * instead. You can also get the cost on the training dataset in the summary.
*/
@Since("2.0.0")
+ @deprecated("This method is deprecated and will be removed in 3.0.0. Use ClusteringEvaluator " +
--- End diff --
It looks reasonable to me to deprecate it in 2.4 so that we can remove it in 3.0, if this is the last one. Then we can have a consistent ML API in 3.0 after removing these deprecated APIs.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22756
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4061/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22756
**[Test build #97525 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97525/testReport)** for PR 22756 at commit [`d5fddb5`](https://github.com/apache/spark/commit/d5fddb56b426ddda8c61cb5fef4b29763482eaf9).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22756
@mgaido91 . If you don't mind, could you split this PR into two PRs? One is adding `deprecation` annotation only. The other is adding new API and updating all examples?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22756
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22756
**[Test build #97501 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97501/testReport)** for PR 22756 at commit [`ed235f2`](https://github.com/apache/spark/commit/ed235f2b5f67978e4d1f687ad5e920c34d843d5c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22756
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97501/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/22756
cc @mengxr WDYT? It does not sound a blocker to me.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22756
LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22756
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22756
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22756
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4081/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/22756
Let me revert it. Thanks!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22756
shall we revert it from master as well? At least we need to update the message `This method is deprecated and will be removed in 3.0.0.`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22756: [SPARK-25758][ML] Deprecate computeCost on Bisect...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22756
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22756
I also understand today's situation and agree with @holdenk 's thought about SPARK-25765 as a blocker. Ping @cloud-fan since you are a release manager. How can we proceed SPARK-25765?
Maybe, this is due to `Preparing Spark release v2.4.0-rc4` which happen two hours ago. We are in the middle of unstable situation.
Also, cc @gatorsmile .
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:
https://github.com/apache/spark/pull/22756
I'm seeing this linked from https://github.com/apache/spark/pull/22764 and I'm wondering if we need to revert this. If the information is not actually available where we tell folks it is I think we need to revert this especially since we are in the middle of the release process. Or raise SPARK-25765 to blocker release blocker.
Have I misunderstood the situation here?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/22756
@dongjoon-hyun sure, thanks. I'll update asap. Thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the issue:
https://github.com/apache/spark/pull/22756
We have to revert this PR in branch-2.4. It is not a blocker and we shouldn't merge it to branch-2.4 this late in this already delayed release.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22756: [SPARK-25758][ML] Deprecate computeCost on Bisect...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22756#discussion_r226239100
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
@@ -125,8 +125,13 @@ class BisectingKMeansModel private[ml] (
/**
* Computes the sum of squared distances between the input points and their corresponding cluster
* centers.
+ *
+ * @deprecated This method is deprecated and will be removed in 3.0.0. Use ClusteringEvaluator
+ * instead. You can also get the cost on the training dataset in the summary.
*/
@Since("2.0.0")
+ @deprecated("This method is deprecated and will be removed in 3.0.0. Use ClusteringEvaluator " +
--- End diff --
yes this is the last one.
> Then we can have a consistent ML API in 3.0 after removing these deprecated APIs.
Yes, that's my goal in targeting this for 2.4.
Thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22756: [SPARK-25758][ML] Deprecate computeCost on Bisect...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22756#discussion_r226129280
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
@@ -125,8 +125,13 @@ class BisectingKMeansModel private[ml] (
/**
* Computes the sum of squared distances between the input points and their corresponding cluster
* centers.
+ *
+ * @deprecated This method is deprecated and will be removed in 3.0.0. Use ClusteringEvaluator
+ * instead. You can also get the cost on the training dataset in the summary.
*/
@Since("2.0.0")
+ @deprecated("This method is deprecated and will be removed in 3.0.0. Use ClusteringEvaluator " +
--- End diff --
I'm wondering if this PR is a blocker for Spark 2.4. According to JIRA desciption (Improvement/Minor), we cannot remove this 3.0.0 because we didn't announce deprecation before 3.0.0.
cc @cloud-fan since he is a release manager.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22756
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22756
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97495/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22756
**[Test build #97525 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97525/testReport)** for PR 22756 at commit [`d5fddb5`](https://github.com/apache/spark/commit/d5fddb56b426ddda8c61cb5fef4b29763482eaf9).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22756
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97525/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/22756
Done
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22756
Merged to master/branch-2.4.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22756
Thank you, @mgaido91 and all!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22756: [SPARK-25758][ML] Deprecate computeCost on Bisect...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22756#discussion_r226181011
--- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaBisectingKMeansExample.java ---
@@ -50,9 +51,14 @@ public static void main(String[] args) {
BisectingKMeans bkm = new BisectingKMeans().setK(2).setSeed(1);
BisectingKMeansModel model = bkm.fit(dataset);
- // Evaluate clustering.
- double cost = model.computeCost(dataset);
- System.out.println("Within Set Sum of Squared Errors = " + cost);
+ // Make predictions
+ Dataset<Row> predictions = model.transform(dataset);
+
+ // Evaluate clustering by computing Silhouette score
+ ClusteringEvaluator evaluator = new ClusteringEvaluator();
+
+ double silhouette = evaluator.evaluate(predictions);
+ System.out.println("Silhouette with squared euclidean distance = " + silhouette);
--- End diff --
@mgaido91 .
If we are going to change all `ml` examples for deprecation, we had better change the following, too.
- https://github.com/apache/spark/blob/master/examples/src/main/python/ml/bisecting_k_means_example.py#L45
```scala
# Evaluate clustering.
cost = model.computeCost(dataset)
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22756
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22756
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22756
reverted from master. Let's move the discussion to https://github.com/apache/spark/pull/22764
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/22756
yes, I agree, if we are not going to deprecate it in 2.4, we need to revert also on master because of @cloud-fan's comment.
This would mean we won't have coherency with `KMeans` though, which is not that good IMHO.
Thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22756: [SPARK-25758][ML] Deprecate computeCost on Bisect...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22756#discussion_r226239196
--- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaBisectingKMeansExample.java ---
@@ -50,9 +51,14 @@ public static void main(String[] args) {
BisectingKMeans bkm = new BisectingKMeans().setK(2).setSeed(1);
BisectingKMeansModel model = bkm.fit(dataset);
- // Evaluate clustering.
- double cost = model.computeCost(dataset);
- System.out.println("Within Set Sum of Squared Errors = " + cost);
+ // Make predictions
+ Dataset<Row> predictions = model.transform(dataset);
+
+ // Evaluate clustering by computing Silhouette score
+ ClusteringEvaluator evaluator = new ClusteringEvaluator();
+
+ double silhouette = evaluator.evaluate(predictions);
+ System.out.println("Silhouette with squared euclidean distance = " + silhouette);
--- End diff --
thanks I'll do @dongjoon-hyun
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org