You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by huaxingao <gi...@git.apache.org> on 2018/06/13 17:56:51 UTC
[GitHub] spark pull request #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to B...
GitHub user huaxingao opened a pull request:
https://github.com/apache/spark/pull/21557
[SPARK-24439][ML][PYTHON]Add distanceMeasure to BisectingKMeans in PySpark
## What changes were proposed in this pull request?
add distanceMeasure to BisectingKMeans in Python.
## How was this patch tested?
added doctest and also manually tested it.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/huaxingao/spark spark-24439
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21557.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21557
----
commit 7f4cb6177003482461c063f90e1e642f714ddcea
Author: Huaxin Gao <hu...@...>
Date: 2018-06-13T17:38:15Z
[SPARK-24439][ML][PYTHON]Add distanceMeasure to BisectingKMeans in PySpark
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to B...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/21557
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21557
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92404/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21557
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21557
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21557
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91788/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21557
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...
Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/21557
merged to master, thanks @huaxingao !
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21557
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/529/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21557
**[Test build #92404 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92404/testReport)** for PR 21557 at commit [`7ca733b`](https://github.com/apache/spark/commit/7ca733beeb18808e145dc2786f9c2c6c1ec40031).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21557
**[Test build #91788 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91788/testReport)** for PR 21557 at commit [`7f4cb61`](https://github.com/apache/spark/commit/7f4cb6177003482461c063f90e1e642f714ddcea).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `class KMeans(JavaEstimator, HasDistanceMeasure, HasFeaturesCol, HasPredictionCol, HasMaxIter,`
* `class BisectingKMeans(JavaEstimator, HasDistanceMeasure, HasFeaturesCol, HasPredictionCol,`
* `class HasDistanceMeasure(Params):`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to B...
Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21557#discussion_r198675445
--- Diff: python/pyspark/ml/clustering.py ---
@@ -622,10 +621,10 @@ def __init__(self, featuresCol="features", predictionCol="prediction", maxIter=2
@keyword_only
@since("2.0.0")
def setParams(self, featuresCol="features", predictionCol="prediction", maxIter=20,
- seed=None, k=4, minDivisibleClusterSize=1.0):
+ seed=None, k=4, minDivisibleClusterSize=1.0, distanceMeasure="euclidean"):
"""
setParams(self, featuresCol="features", predictionCol="prediction", maxIter=20, \
- seed=None, k=4, minDivisibleClusterSize=1.0)
+ seed=None, k=4, minDivisibleClusterSize=1.0, distanceMeasure="euclidean")
Sets params for BisectingKMeans.
--- End diff --
I know we already have `setDistanceMeasure` and `getDistanceMeasure` methods from the shared param, but can you also add them here so we can use the `since` decorator? (same as KMeans)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21557
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on the issue:
https://github.com/apache/spark/pull/21557
Thank you very much for your help! @BryanCutler
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21557
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21557
**[Test build #91788 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91788/testReport)** for PR 21557 at commit [`7f4cb61`](https://github.com/apache/spark/commit/7f4cb6177003482461c063f90e1e642f714ddcea).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21557
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3993/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to B...
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:
https://github.com/apache/spark/pull/21557#discussion_r198684081
--- Diff: python/pyspark/ml/clustering.py ---
@@ -622,10 +621,10 @@ def __init__(self, featuresCol="features", predictionCol="prediction", maxIter=2
@keyword_only
@since("2.0.0")
def setParams(self, featuresCol="features", predictionCol="prediction", maxIter=20,
- seed=None, k=4, minDivisibleClusterSize=1.0):
+ seed=None, k=4, minDivisibleClusterSize=1.0, distanceMeasure="euclidean"):
"""
setParams(self, featuresCol="features", predictionCol="prediction", maxIter=20, \
- seed=None, k=4, minDivisibleClusterSize=1.0)
+ seed=None, k=4, minDivisibleClusterSize=1.0, distanceMeasure="euclidean")
Sets params for BisectingKMeans.
--- End diff --
@BryanCutler Thank you very much for your review. I will make change.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21557
**[Test build #92404 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92404/testReport)** for PR 21557 at commit [`7ca733b`](https://github.com/apache/spark/commit/7ca733beeb18808e145dc2786f9c2c6c1ec40031).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21557
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/102/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org