You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by huaxingao <gi...@git.apache.org> on 2018/06/13 17:56:51 UTC

[GitHub] spark pull request #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to B...

GitHub user huaxingao opened a pull request:

    https://github.com/apache/spark/pull/21557

    [SPARK-24439][ML][PYTHON]Add distanceMeasure to BisectingKMeans in PySpark

    
    
    ## What changes were proposed in this pull request?
    
    add  distanceMeasure to BisectingKMeans in Python.
    
    ## How was this patch tested?
    
    added doctest and also manually tested it.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/huaxingao/spark spark-24439

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21557.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21557
    
----
commit 7f4cb6177003482461c063f90e1e642f714ddcea
Author: Huaxin Gao <hu...@...>
Date:   2018-06-13T17:38:15Z

    [SPARK-24439][ML][PYTHON]Add distanceMeasure to BisectingKMeans in PySpark

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to B...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21557


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21557
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92404/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21557
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21557
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21557
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91788/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21557
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/21557
  
    merged to master, thanks @huaxingao !


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21557
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/529/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21557
  
    **[Test build #92404 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92404/testReport)** for PR 21557 at commit [`7ca733b`](https://github.com/apache/spark/commit/7ca733beeb18808e145dc2786f9c2c6c1ec40031).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21557
  
    **[Test build #91788 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91788/testReport)** for PR 21557 at commit [`7f4cb61`](https://github.com/apache/spark/commit/7f4cb6177003482461c063f90e1e642f714ddcea).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class KMeans(JavaEstimator, HasDistanceMeasure, HasFeaturesCol, HasPredictionCol, HasMaxIter,`
      * `class BisectingKMeans(JavaEstimator, HasDistanceMeasure, HasFeaturesCol, HasPredictionCol,`
      * `class HasDistanceMeasure(Params):`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to B...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21557#discussion_r198675445
  
    --- Diff: python/pyspark/ml/clustering.py ---
    @@ -622,10 +621,10 @@ def __init__(self, featuresCol="features", predictionCol="prediction", maxIter=2
         @keyword_only
         @since("2.0.0")
         def setParams(self, featuresCol="features", predictionCol="prediction", maxIter=20,
    -                  seed=None, k=4, minDivisibleClusterSize=1.0):
    +                  seed=None, k=4, minDivisibleClusterSize=1.0, distanceMeasure="euclidean"):
             """
             setParams(self, featuresCol="features", predictionCol="prediction", maxIter=20, \
    -                  seed=None, k=4, minDivisibleClusterSize=1.0)
    +                  seed=None, k=4, minDivisibleClusterSize=1.0, distanceMeasure="euclidean")
             Sets params for BisectingKMeans.
    --- End diff --
    
    I know we already have `setDistanceMeasure` and `getDistanceMeasure` methods from the shared param, but can you also add them here so we can use the `since` decorator?  (same as KMeans)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21557
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on the issue:

    https://github.com/apache/spark/pull/21557
  
    Thank you very much for your help! @BryanCutler 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21557
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21557
  
    **[Test build #91788 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91788/testReport)** for PR 21557 at commit [`7f4cb61`](https://github.com/apache/spark/commit/7f4cb6177003482461c063f90e1e642f714ddcea).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21557
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3993/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to B...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21557#discussion_r198684081
  
    --- Diff: python/pyspark/ml/clustering.py ---
    @@ -622,10 +621,10 @@ def __init__(self, featuresCol="features", predictionCol="prediction", maxIter=2
         @keyword_only
         @since("2.0.0")
         def setParams(self, featuresCol="features", predictionCol="prediction", maxIter=20,
    -                  seed=None, k=4, minDivisibleClusterSize=1.0):
    +                  seed=None, k=4, minDivisibleClusterSize=1.0, distanceMeasure="euclidean"):
             """
             setParams(self, featuresCol="features", predictionCol="prediction", maxIter=20, \
    -                  seed=None, k=4, minDivisibleClusterSize=1.0)
    +                  seed=None, k=4, minDivisibleClusterSize=1.0, distanceMeasure="euclidean")
             Sets params for BisectingKMeans.
    --- End diff --
    
    @BryanCutler Thank you very much for your review. I will make change. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21557
  
    **[Test build #92404 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92404/testReport)** for PR 21557 at commit [`7ca733b`](https://github.com/apache/spark/commit/7ca733beeb18808e145dc2786f9c2c6c1ec40031).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21557
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/102/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org