You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by yanboliang <gi...@git.apache.org> on 2016/05/17 10:09:21 UTC

[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

GitHub user yanboliang opened a pull request:

    https://github.com/apache/spark/pull/13148

    [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit for clustering

    ## What changes were proposed in this pull request?
    Audit Scala API for ml.clustering.
    Fix some wrong API documentations and update outdated one.
    
    ## How was this patch tested?
    Existing unit tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yanboliang/spark spark-15361

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13148.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13148
    
----
commit de584aa001eac755493e48d1ed0f15c4f1913591
Author: Yanbo Liang <yb...@gmail.com>
Date:   2016-05-17T10:03:51Z

    ML 2.0 QA: Scala APIs audit for clustering

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13148#issuecomment-219684645
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/13148#issuecomment-219848715
  
    That's the only issue I saw.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13148#discussion_r63789574
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
    @@ -39,23 +39,27 @@ private[clustering] trait BisectingKMeansParams extends Params
       with HasMaxIter with HasFeaturesCol with HasSeed with HasPredictionCol {
     
       /**
    -   * Set the number of clusters to create (k). Must be > 1. Default: 2.
    +   * The desired number of leaf clusters. Must be > 1. Default: 4.
    +   * The actual number could be smaller if there are no divisible leaf clusters.
        * @group param
        */
       @Since("2.0.0")
    -  final val k = new IntParam(this, "k", "number of clusters to create", (x: Int) => x > 1)
    +  final val k = new IntParam(this, "k", "The desired number of leaf clusters. " +
    +    "Must be > 1. Default: 4.", ParamValidators.gt(1))
    --- End diff --
    
    cc @sethah / @MLnick who have also been thinking about default values in PyDocs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13148#issuecomment-219684646
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58684/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13148#issuecomment-219927964
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58743/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/13148#issuecomment-219846049
  
    I'll take a look now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13148


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/13148#issuecomment-220442030
  
    LGTM
    Merging with master and branch-2.0
    Thanks!
    
    @holdenk We can discuss this Pyspark Params issue more.  I agree it will be important to offer users an easy way to view the defaults in PySpark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13148#discussion_r63756694
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
    @@ -39,23 +39,27 @@ private[clustering] trait BisectingKMeansParams extends Params
       with HasMaxIter with HasFeaturesCol with HasSeed with HasPredictionCol {
     
       /**
    -   * Set the number of clusters to create (k). Must be > 1. Default: 2.
    +   * The desired number of leaf clusters. Must be > 1. Default: 4.
    +   * The actual number could be smaller if there are no divisible leaf clusters.
        * @group param
        */
       @Since("2.0.0")
    -  final val k = new IntParam(this, "k", "number of clusters to create", (x: Int) => x > 1)
    +  final val k = new IntParam(this, "k", "The desired number of leaf clusters. " +
    +    "Must be > 1. Default: 4.", ParamValidators.gt(1))
    --- End diff --
    
    Built in documentation is also displayed in PyDocs and having the default values there is incredibly useful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13148#issuecomment-219927963
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13148#discussion_r63799888
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
    @@ -39,23 +39,27 @@ private[clustering] trait BisectingKMeansParams extends Params
       with HasMaxIter with HasFeaturesCol with HasSeed with HasPredictionCol {
     
       /**
    -   * Set the number of clusters to create (k). Must be > 1. Default: 2.
    +   * The desired number of leaf clusters. Must be > 1. Default: 4.
    +   * The actual number could be smaller if there are no divisible leaf clusters.
        * @group param
        */
       @Since("2.0.0")
    -  final val k = new IntParam(this, "k", "number of clusters to create", (x: Int) => x > 1)
    +  final val k = new IntParam(this, "k", "The desired number of leaf clusters. " +
    +    "Must be > 1. Default: 4.", ParamValidators.gt(1))
    --- End diff --
    
    For what use cases?  I could imagine
    ```
    binarizer = Binarizer()
    binarizer.threshold  # prints Param fields, including built-in doc
    ```
    
    Maybe instead we should add a method explain taking a string:
    ```
    binarizer.explainParam("threshold")  # This currently requires a Param, not a String
    ```
    or
    ```
    binarizer.explain("threshold")  # This could be a shorter alias.
    ```
    
    I'd prefer not to specify defaults in extra places since it is easy to type the wrong value or get them out of sync if defaults change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13148#issuecomment-219923723
  
    **[Test build #58743 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58743/consoleFull)** for PR 13148 at commit [`c0f3bf7`](https://github.com/apache/spark/commit/c0f3bf753930c04252885b1260449348fa2b9f61).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13148#issuecomment-219927902
  
    **[Test build #58743 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58743/consoleFull)** for PR 13148 at commit [`c0f3bf7`](https://github.com/apache/spark/commit/c0f3bf753930c04252885b1260449348fa2b9f61).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13148#issuecomment-219684430
  
    **[Test build #58684 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58684/consoleFull)** for PR 13148 at commit [`de584aa`](https://github.com/apache/spark/commit/de584aa001eac755493e48d1ed0f15c4f1913591).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13148#issuecomment-219675843
  
    **[Test build #58684 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58684/consoleFull)** for PR 13148 at commit [`de584aa`](https://github.com/apache/spark/commit/de584aa001eac755493e48d1ed0f15c4f1913591).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13148#discussion_r63600571
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
    @@ -39,23 +39,27 @@ private[clustering] trait BisectingKMeansParams extends Params
       with HasMaxIter with HasFeaturesCol with HasSeed with HasPredictionCol {
     
       /**
    -   * Set the number of clusters to create (k). Must be > 1. Default: 2.
    +   * The desired number of leaf clusters. Must be > 1. Default: 4.
    +   * The actual number could be smaller if there are no divisible leaf clusters.
        * @group param
        */
       @Since("2.0.0")
    -  final val k = new IntParam(this, "k", "number of clusters to create", (x: Int) => x > 1)
    +  final val k = new IntParam(this, "k", "The desired number of leaf clusters. " +
    +    "Must be > 1. Default: 4.", ParamValidators.gt(1))
    --- End diff --
    
    No need to state default in built-in doc.  Scala doc is OK.  Built-in doc is only displayed by explainParams, which reads from the defaultParamMap already.  Same for the other param changes below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the pull request:

    https://github.com/apache/spark/pull/13148#issuecomment-220446944
  
    @jkbradley see the discussion/proposed solutions on https://github.com/apache/spark/pull/12914


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org