You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by petro-rudenko <gi...@git.apache.org> on 2015/04/14 17:18:27 UTC

[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

GitHub user petro-rudenko opened a pull request:

    https://github.com/apache/spark/pull/5510

    [SPARK-6901][Ml]ParamGridBuilder.build with no grids should return an emty array

    ParamGridBuilder.build with no grids returns array with an empty param map.
    ```scala
    assert((new ParamGridBuilder).build().size == 1)
    ```
    I have a logic if ParamGridBuilder is empty - then not use CrossValidator. It confuses because if the ParamGridBuilder has one grid point in it will also return an array of size 1:
    ```scala
    assert((new ParamGridBuilder).addGrid(lr.regParam, Array(0.1)).build().size == 1)
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/petro-rudenko/spark SPARK-6901

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5510.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5510
    
----
commit 742bd209c7fd5fc82c65a86a1b28de2470db018b
Author: Peter Rudenko <pe...@gmail.com>
Date:   2015-04-14T15:12:22Z

    [SPARK-6901][Ml]ParamGridBuilder.build with no grids should return an empty array

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5510#discussion_r28338742
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/tuning/ParamGridBuilderSuite.scala ---
    @@ -59,5 +59,9 @@ class ParamGridBuilderSuite extends FunSuite {
           (10, "input1"),
           (20, "input1"))
         validateGrid(maps1, expected1)
    +
    +    val maps2 = (new ParamGridBuilder()).build()
    --- End diff --
    
    Trivial: The initial set of parens isn't needed. I think you can write just `Set.empty[(Int,String)]` alone?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by petro-rudenko <gi...@git.apache.org>.
Github user petro-rudenko commented on the pull request:

    https://github.com/apache/spark/pull/5510#issuecomment-94418041
  
    For my case i can live with default behaviour. It's just not intuitive that empty ParamGridBuilder returns array of size 1 and also not clear how to handle just 1 parameter. E.g. if there's only 1 param just set it explicitly and not use crossvalidation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5510#issuecomment-92906766
  
      [Test build #30247 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30247/consoleFull) for   PR 5510 at commit [`742bd20`](https://github.com/apache/spark/commit/742bd209c7fd5fc82c65a86a1b28de2470db018b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5510#issuecomment-92934748
  
      [Test build #30250 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30250/consoleFull) for   PR 5510 at commit [`94cfaa7`](https://github.com/apache/spark/commit/94cfaa7053d5a732bdae21d4412d771c36a30391).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5510#issuecomment-93366639
  
      [Test build #30337 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30337/consoleFull) for   PR 5510 at commit [`7d432f1`](https://github.com/apache/spark/commit/7d432f158915117b3b3444d1356d76f8ae21cba0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5510#issuecomment-92984899
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30247/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/5510#issuecomment-93056773
  
    @srowen @petro-rudenko My only question about this patch is what to do with CrossValidator.  Users will likely try passing an empty grid to CrossValidator sometime in the future.  Should CrossValidator fail with an error message, or should it train with the estimator's built-in parameters?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5510#issuecomment-94190375
  
    @petro-rudenko what do you think about this instead? returning 1 empty map does make some sense viewed from this perspective. Does it cause a problem you can't work around for your use case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5510#discussion_r28338583
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/ParamGridBuilder.scala ---
    @@ -100,10 +100,11 @@ class ParamGridBuilder {
        * Builds and returns all combinations of parameters specified by the param grid.
        */
       def build(): Array[ParamMap] = {
    -    var paramMaps = Array(new ParamMap)
    +    var paramMaps = Array.empty[ParamMap]
    --- End diff --
    
    Maybe simpler just to check `values.isEmpty` at the outset?
    Not really related, but I don't see why the loop needs an extra `newParamMaps` variable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5510#issuecomment-93418495
  
      [Test build #30337 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30337/consoleFull) for   PR 5510 at commit [`7d432f1`](https://github.com/apache/spark/commit/7d432f158915117b3b3444d1356d76f8ae21cba0).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5510#discussion_r28369309
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/ParamGridBuilder.scala ---
    @@ -100,13 +100,16 @@ class ParamGridBuilder {
        * Builds and returns all combinations of parameters specified by the param grid.
        */
       def build(): Array[ParamMap] = {
    -    var paramMaps = Array(new ParamMap)
    -    paramGrid.foreach { case (param, values) =>
    -      val newParamMaps = values.flatMap { v =>
    -        paramMaps.map(_.copy.put(param.asInstanceOf[Param[Any]], v))
    +    if (paramGrid.isEmpty) Array.empty[ParamMap]
    --- End diff --
    
    style: Can you please put braces around this part as well to match the braces around the "else" part?  (That comment from @srowen was hidden by the update)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by petro-rudenko <gi...@git.apache.org>.
Github user petro-rudenko commented on the pull request:

    https://github.com/apache/spark/pull/5510#issuecomment-94419332
  
    For my case it means:
    ```scala
    (new ParamGridBuilder).addGrid(lr.regParam, Array(0.1)) == (lr.regParam=0.1 && new ParamGridBuilder.build())
    ```
    
    So if there's only 1 param - just overwrite the default value and again run as with empty param map.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5510#issuecomment-92996783
  
      [Test build #30250 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30250/consoleFull) for   PR 5510 at commit [`94cfaa7`](https://github.com/apache/spark/commit/94cfaa7053d5a732bdae21d4412d771c36a30391).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch **adds the following new dependencies:**
       * `commons-math3-3.4.1.jar`
    
     * This patch **removes the following dependencies:**
       * `commons-math3-3.1.1.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5510#issuecomment-93418514
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30337/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5510#issuecomment-92996829
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30250/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5510#issuecomment-94754511
  
    This was a useful discussion. Would you mind closing this PR though?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5510#issuecomment-94418875
  
    An empty builder means "don't try varying anything" which means "try only the defaults". Meh, makes some sense, looked at that way. What's the question about 1 param?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5510#issuecomment-94419799
  
    Yes, that seems like the right behavior. In this case `regParam` is always forced to be 0.1 in all combinations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/5510#issuecomment-94506664
  
    I agree that seems like the correct behavior.  It's hard to find a "right" solution in degenerate cases, so there may be no perfect answer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5510#discussion_r28339474
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/ParamGridBuilder.scala ---
    @@ -100,10 +100,11 @@ class ParamGridBuilder {
        * Builds and returns all combinations of parameters specified by the param grid.
        */
       def build(): Array[ParamMap] = {
    -    var paramMaps = Array(new ParamMap)
    +    var paramMaps = Array.empty[ParamMap]
    --- End diff --
    
    Yes, though I'd put braces in both branches and indent if writing it this way, or just handle it all in an early return statement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by petro-rudenko <gi...@git.apache.org>.
Github user petro-rudenko closed the pull request at:

    https://github.com/apache/spark/pull/5510


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by petro-rudenko <gi...@git.apache.org>.
Github user petro-rudenko commented on the pull request:

    https://github.com/apache/spark/pull/5510#issuecomment-93412249
  
    Ideally crossvalidator should handle next cases:
    1) No parameters at all:  just run est.fit(dataset, new ParamMap)
    2) 1 param: set this param to estimator (assume it's a weird way to override default param) and again do step 1.
    3) 2+ params: do crossvalidation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by petro-rudenko <gi...@git.apache.org>.
Github user petro-rudenko commented on the pull request:

    https://github.com/apache/spark/pull/5510#issuecomment-93373411
  
    Maybe in Crossvalidator handle empty estimatorParamMap?
    ```scala
    /** @group setParam */
      def setEstimatorParamMaps(value: Array[ParamMap]): this.type = {
        if (value.isEmpty) {
          set(estimatorParamMaps, Array(new ParamMap))
        } else {
          set(estimatorParamMaps, value)
        }
      }
    ```
    
    ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5510#issuecomment-93287009
  
    Thinking more about this...
    
    Hm, if the semantics are, "build with default hyperparams, overridden by any hyperparams set explicitly in `ParamMap`", then an empty `ParamMap` doesn't sound like an error. 
    
    Viewed that way, the original behavior sort of makes sense: with no values for any key, there is 1 rather than 0 combinations to try: the all-default combination. For the same reason that for example `Array[Int]().product` is 1, not 0. Sort of.
    
    Should it be an error to build a param grid with no grid at all, maybe? that would also resolve this situation.
    
    If the intent of `ParamMap` is that it should have a setting for _all_ hyperparameters, then it's an error to build a grid without a range for every hyperparameter -- and certainly to build a grid without any at all.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5510#issuecomment-92984865
  
      [Test build #30247 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30247/consoleFull) for   PR 5510 at commit [`742bd20`](https://github.com/apache/spark/commit/742bd209c7fd5fc82c65a86a1b28de2470db018b).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch **adds the following new dependencies:**
       * `commons-math3-3.4.1.jar`
    
     * This patch **removes the following dependencies:**
       * `commons-math3-3.1.1.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by petro-rudenko <gi...@git.apache.org>.
Github user petro-rudenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5510#discussion_r28339279
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/ParamGridBuilder.scala ---
    @@ -100,10 +100,11 @@ class ParamGridBuilder {
        * Builds and returns all combinations of parameters specified by the param grid.
        */
       def build(): Array[ParamMap] = {
    -    var paramMaps = Array(new ParamMap)
    +    var paramMaps = Array.empty[ParamMap]
    --- End diff --
    
    Do you mean like this:
    ```scala
    def build(): Array[ParamMap] = {
        if (paramGrid.isEmpty) Array.empty[ParamMap]
        else {
          var paramMaps = Array(new ParamMap)
          paramGrid.foreach { case (param, values) =>
            val newParamMaps = values.flatMap { v =>
              paramMaps.map(_.copy.put(param.asInstanceOf[Param[Any]], v))
            }
            paramMaps = newParamMaps.toArray
          }
          paramMaps
        }
      }
    ```
    ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6901][Ml]ParamGridBuilder.build with no...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/5510#issuecomment-93468468
  
    Thinking more, I'm starting to feel like returning 1 empty ParamMap is better than returning 0 ParamMaps.  Basically, I think the natural "unit" grid should 1 empty ParamMap, rather than 0 ParamMaps.  Here are some thoughts:
    * Suppose we later add support for users doing cross products between 2 param grids (Array[ParamMap]).  Then mathematically, the "unit" grid should be 1 empty ParamMap.
    * Users should get the same behavior if they (a) create a "unit" grid and iterate over it themselves or (b) create a "unit" grid and pass it to CrossValidator.  Currently, the behavior is the same.  But if a unit grid is 0 ParamMaps and CrossValidator handles that specially, then the behavior will not be the same.
    
    I'd vote for adding a log warning but keeping the original behavior of ParamGridBuilder.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org