You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2018/10/17 15:07:00 UTC

[GitHub] spark pull request #22756: [SPARK-25758][ML] Deprecate computeCost on Bisect...

GitHub user mgaido91 opened a pull request:

    https://github.com/apache/spark/pull/22756

    [SPARK-25758][ML] Deprecate computeCost on BisectingKMeans

    ## What changes were proposed in this pull request?
    
    The PR proposes to deprecate the `computeCost` method on `BisectingKMeans` in favor of the adoption of `ClusteringEvaluator` in order to evaluate the clustering. Moreover, it introduces a `trainingCost` value in the `BisectingKMeansSummary` which exposes the same information computed on the training dataset.
    
    ## How was this patch tested?
    
    improved UTs


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mgaido91/spark SPARK-25758

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22756.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22756
    
----
commit 476198910a001c8c58df3416a5b03aef6deb0882
Author: Marco Gaido <ma...@...>
Date:   2018-10-17T15:04:02Z

    [SPARK-25758][ML] Deprecate computeCost on BisectingKMeans

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22756: [SPARK-25758][ML] Deprecate computeCost on Bisect...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22756#discussion_r226023867
  
    --- Diff: python/pyspark/ml/clustering.py ---
    @@ -335,20 +335,6 @@ def clusterCenters(self):
             """Get the cluster centers, represented as a list of NumPy arrays."""
             return [c.toArray() for c in self._call_java("clusterCenters")]
     
    -    @since("2.0.0")
    -    def computeCost(self, dataset):
    --- End diff --
    
    Hm, can you actually remove this, vs just deprecate it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4065/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    LGTM. thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    **[Test build #97501 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97501/testReport)** for PR 22756 at commit [`ed235f2`](https://github.com/apache/spark/commit/ed235f2b5f67978e4d1f687ad5e920c34d843d5c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    **[Test build #97495 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97495/testReport)** for PR 22756 at commit [`4761989`](https://github.com/apache/spark/commit/476198910a001c8c58df3416a5b03aef6deb0882).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22756: [SPARK-25758][ML] Deprecate computeCost on Bisect...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22756#discussion_r226030159
  
    --- Diff: python/pyspark/ml/clustering.py ---
    @@ -335,20 +335,6 @@ def clusterCenters(self):
             """Get the cluster centers, represented as a list of NumPy arrays."""
             return [c.toArray() for c in self._call_java("clusterCenters")]
     
    -    @since("2.0.0")
    -    def computeCost(self, dataset):
    --- End diff --
    
    sorry, this was not intended, I am fixing this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    cc @holdenk @srowen 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22756: [SPARK-25758][ML] Deprecate computeCost on Bisect...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22756#discussion_r226177653
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
    @@ -125,8 +125,13 @@ class BisectingKMeansModel private[ml] (
       /**
        * Computes the sum of squared distances between the input points and their corresponding cluster
        * centers.
    +   *
    +   * @deprecated This method is deprecated and will be removed in 3.0.0. Use ClusteringEvaluator
    +   *             instead. You can also get the cost on the training dataset in the summary.
        */
       @Since("2.0.0")
    +  @deprecated("This method is deprecated and will be removed in 3.0.0. Use ClusteringEvaluator " +
    --- End diff --
    
    Thank you for the decision, @cloud-fan !


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    **[Test build #97495 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97495/testReport)** for PR 22756 at commit [`4761989`](https://github.com/apache/spark/commit/476198910a001c8c58df3416a5b03aef6deb0882).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22756: [SPARK-25758][ML] Deprecate computeCost on Bisect...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22756#discussion_r226141315
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
    @@ -125,8 +125,13 @@ class BisectingKMeansModel private[ml] (
       /**
        * Computes the sum of squared distances between the input points and their corresponding cluster
        * centers.
    +   *
    +   * @deprecated This method is deprecated and will be removed in 3.0.0. Use ClusteringEvaluator
    +   *             instead. You can also get the cost on the training dataset in the summary.
        */
       @Since("2.0.0")
    +  @deprecated("This method is deprecated and will be removed in 3.0.0. Use ClusteringEvaluator " +
    --- End diff --
    
    It looks reasonable to me to deprecate it in 2.4 so that we can remove it in 3.0, if this is the last one. Then we can have a consistent ML API in 3.0 after removing these deprecated APIs.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4061/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    **[Test build #97525 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97525/testReport)** for PR 22756 at commit [`d5fddb5`](https://github.com/apache/spark/commit/d5fddb56b426ddda8c61cb5fef4b29763482eaf9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    @mgaido91 . If you don't mind, could you split this PR into two PRs? One is adding `deprecation` annotation only. The other is adding new API and updating all examples?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    **[Test build #97501 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97501/testReport)** for PR 22756 at commit [`ed235f2`](https://github.com/apache/spark/commit/ed235f2b5f67978e4d1f687ad5e920c34d843d5c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97501/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    cc @mengxr WDYT? It does not sound a blocker to me. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4081/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    Let me revert it. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    shall we revert it from master as well? At least we need to update the message `This method is deprecated and will be removed in 3.0.0.`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22756: [SPARK-25758][ML] Deprecate computeCost on Bisect...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22756


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    I also understand today's situation and agree with @holdenk 's thought about SPARK-25765 as a blocker. Ping @cloud-fan since you are a release manager. How can we proceed SPARK-25765?
    
    Maybe, this is due to `Preparing Spark release v2.4.0-rc4` which happen two hours ago. We are in the middle of unstable situation.
    
    Also, cc @gatorsmile .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    I'm seeing this linked from https://github.com/apache/spark/pull/22764 and I'm wondering if we need to revert this. If the information is not actually available where we tell folks it is I think we need to revert this especially since we are in the middle of the release process. Or raise SPARK-25765 to blocker release blocker.
    
    Have I misunderstood the situation here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    @dongjoon-hyun sure, thanks. I'll update asap. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    We have to revert this PR in branch-2.4. It is not a blocker and we shouldn't merge it to branch-2.4 this late in this already delayed release.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22756: [SPARK-25758][ML] Deprecate computeCost on Bisect...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22756#discussion_r226239100
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
    @@ -125,8 +125,13 @@ class BisectingKMeansModel private[ml] (
       /**
        * Computes the sum of squared distances between the input points and their corresponding cluster
        * centers.
    +   *
    +   * @deprecated This method is deprecated and will be removed in 3.0.0. Use ClusteringEvaluator
    +   *             instead. You can also get the cost on the training dataset in the summary.
        */
       @Since("2.0.0")
    +  @deprecated("This method is deprecated and will be removed in 3.0.0. Use ClusteringEvaluator " +
    --- End diff --
    
    yes this is the last one.
    
    > Then we can have a consistent ML API in 3.0 after removing these deprecated APIs.
    
    Yes, that's my goal in targeting this for 2.4.
    
    Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22756: [SPARK-25758][ML] Deprecate computeCost on Bisect...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22756#discussion_r226129280
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
    @@ -125,8 +125,13 @@ class BisectingKMeansModel private[ml] (
       /**
        * Computes the sum of squared distances between the input points and their corresponding cluster
        * centers.
    +   *
    +   * @deprecated This method is deprecated and will be removed in 3.0.0. Use ClusteringEvaluator
    +   *             instead. You can also get the cost on the training dataset in the summary.
        */
       @Since("2.0.0")
    +  @deprecated("This method is deprecated and will be removed in 3.0.0. Use ClusteringEvaluator " +
    --- End diff --
    
    I'm wondering if this PR is a blocker for Spark 2.4. According to JIRA desciption (Improvement/Minor), we cannot remove this 3.0.0 because we didn't announce deprecation before 3.0.0.
    
    cc @cloud-fan since he is a release manager.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97495/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    **[Test build #97525 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97525/testReport)** for PR 22756 at commit [`d5fddb5`](https://github.com/apache/spark/commit/d5fddb56b426ddda8c61cb5fef4b29763482eaf9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97525/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    Done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    Merged to master/branch-2.4.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    Thank you, @mgaido91 and all!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22756: [SPARK-25758][ML] Deprecate computeCost on Bisect...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22756#discussion_r226181011
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaBisectingKMeansExample.java ---
    @@ -50,9 +51,14 @@ public static void main(String[] args) {
         BisectingKMeans bkm = new BisectingKMeans().setK(2).setSeed(1);
         BisectingKMeansModel model = bkm.fit(dataset);
     
    -    // Evaluate clustering.
    -    double cost = model.computeCost(dataset);
    -    System.out.println("Within Set Sum of Squared Errors = " + cost);
    +    // Make predictions
    +    Dataset<Row> predictions = model.transform(dataset);
    +
    +    // Evaluate clustering by computing Silhouette score
    +    ClusteringEvaluator evaluator = new ClusteringEvaluator();
    +
    +    double silhouette = evaluator.evaluate(predictions);
    +    System.out.println("Silhouette with squared euclidean distance = " + silhouette);
    --- End diff --
    
    @mgaido91 .
    If we are going to change all `ml` examples for deprecation, we had better change the following, too. 
    - https://github.com/apache/spark/blob/master/examples/src/main/python/ml/bisecting_k_means_example.py#L45
    ```scala
        # Evaluate clustering.
        cost = model.computeCost(dataset)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    reverted from master. Let's move the discussion to https://github.com/apache/spark/pull/22764


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22756
  
    yes, I agree, if we are not going to deprecate it in 2.4, we need to revert also on master because of @cloud-fan's comment.
    
    This would mean we won't have coherency with `KMeans` though, which is not that good IMHO.
    Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22756: [SPARK-25758][ML] Deprecate computeCost on Bisect...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22756#discussion_r226239196
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaBisectingKMeansExample.java ---
    @@ -50,9 +51,14 @@ public static void main(String[] args) {
         BisectingKMeans bkm = new BisectingKMeans().setK(2).setSeed(1);
         BisectingKMeansModel model = bkm.fit(dataset);
     
    -    // Evaluate clustering.
    -    double cost = model.computeCost(dataset);
    -    System.out.println("Within Set Sum of Squared Errors = " + cost);
    +    // Make predictions
    +    Dataset<Row> predictions = model.transform(dataset);
    +
    +    // Evaluate clustering by computing Silhouette score
    +    ClusteringEvaluator evaluator = new ClusteringEvaluator();
    +
    +    double silhouette = evaluator.evaluate(predictions);
    +    System.out.println("Silhouette with squared euclidean distance = " + silhouette);
    --- End diff --
    
    thanks I'll do @dongjoon-hyun 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org