You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dominik-jastrzebski <gi...@git.apache.org> on 2016/05/04 09:42:05 UTC

[GitHub] spark pull request: [SPARK-15119] [ML] Add a validator to Decision...

GitHub user dominik-jastrzebski opened a pull request:

    https://github.com/apache/spark/pull/12895

    [SPARK-15119] [ML] Add a validator to DecisionTreeParams.minInfoGain parameter

    ## What changes were proposed in this pull request?
    
    A validator for minGainInfo was added, assuring the value is >= 0.
    
    ## How was this patch tested?
    
    By running unit tests in org.apache.spark.ml.tree.impl.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dominik-jastrzebski/spark validator

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12895.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12895
    
----
commit 53a052910733072b31ecc97bc1ff18afbc68bf6e
Author: Dominik Jastrzębski <do...@codilime.com>
Date:   2016-05-04T09:23:12Z

    [SPARK-15119] [ML] Add a validator to DecisionTreeParams.minInfoGain parameter

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15119] [ML] Add a validator to Decision...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/12895#issuecomment-216815097
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15119] [ML] Add a validator to Decision...

Posted by dominik-jastrzebski <gi...@git.apache.org>.
Github user dominik-jastrzebski commented on the pull request:

    https://github.com/apache/spark/pull/12895#issuecomment-222117861
  
    @MLnick @sethah up


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15119] [ML] Add a validator to Decision...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12895#issuecomment-216823357
  
    **[Test build #57747 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57747/consoleFull)** for PR 12895 at commit [`53a0529`](https://github.com/apache/spark/commit/53a052910733072b31ecc97bc1ff18afbc68bf6e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15119] [ML] Add a validator to Decision...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the pull request:

    https://github.com/apache/spark/pull/12895#issuecomment-217177347
  
    I think this is good, but while we're here can we make this a full audit of tree params? @dominik-jastrzebski would you mind having a look at the correctness of the other params `treeParams.scala`? For instance, the `impurity` validator [here](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala#L195) should use `ParamValidators.inArray(...)` instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12895: [SPARK-15119] [ML] Add a validator to DecisionTre...

Posted by dominik-jastrzebski <gi...@git.apache.org>.
Github user dominik-jastrzebski closed the pull request at:

    https://github.com/apache/spark/pull/12895


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15119] [ML] Add a validator to Decision...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/12895#issuecomment-217158842
  
    cc @sethah 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15119] [ML] Add a validator to Decision...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/12895#issuecomment-219018452
  
    @dominik-jastrzebski any update?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15119] [ML] Add a validator to Decision...

Posted by dominik-jastrzebski <gi...@git.apache.org>.
Github user dominik-jastrzebski commented on the pull request:

    https://github.com/apache/spark/pull/12895#issuecomment-217449436
  
    Ok, I can check the other validators in `treeParams.scala`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15119] [ML] Add a validator to Decision...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the pull request:

    https://github.com/apache/spark/pull/12895#issuecomment-222168818
  
    @dominik-jastrzebski Thanks for pointing that out. I think a case insensitive check might be a good idea if you want to implement something, but I don't feel too strongly about it. Thoughts?
    
    cc @holdenk 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15119] [ML] Add a validator to Decision...

Posted by goodsoldiersvejk <gi...@git.apache.org>.
Github user goodsoldiersvejk commented on the pull request:

    https://github.com/apache/spark/pull/12895#issuecomment-222559584
  
    Hi,
      Marginally related to this, it would be good to improve the Spark mllib documentation to provide context for using the information gain parameter appropriately (if I'm not mistaken the information gain is simply the mutual information which is bounded between zero and the minimum of the entropy of each of two given random variables). It would also be useful to abstract out the mutual information or information gain; I ended up abusing spark decision trees to do some work with large-scale mutual information statistics because of the ready presence of labelling and distributed computation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15119] [ML] Add a validator to Decision...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12895#issuecomment-216813224
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12895: [SPARK-15119] [ML] Add a validator to DecisionTreeParams...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/12895
  
    This issue is now fixed in master, are you ok closing this PR?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15119] [ML] Add a validator to Decision...

Posted by dominik-jastrzebski <gi...@git.apache.org>.
Github user dominik-jastrzebski commented on the pull request:

    https://github.com/apache/spark/pull/12895#issuecomment-221195720
  
    I discovered that `impurity` and `lossType` validators in several places perform a case-insensitive comparison, for example: `(value: String) => GBTRegressorParams.supportedLossTypes.contains(value.toLowerCase)`.
    
    Probably it's the reason why ParamValidators aren't used here. Maybe we should introduce something like `ParamValidators.inArray(expected: Array[String], caseInsensitive: Boolean = false)`?
    
    Beside that, I didn't find any missing or incorrect validators.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15119] [ML] Add a validator to Decision...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12895#issuecomment-216823446
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57747/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15119] [ML] Add a validator to Decision...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12895#issuecomment-216823445
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12895: [SPARK-15119] [ML] Add a validator to DecisionTreeParams...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/12895
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15119] [ML] Add a validator to Decision...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12895#issuecomment-216815632
  
    **[Test build #57747 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57747/consoleFull)** for PR 12895 at commit [`53a0529`](https://github.com/apache/spark/commit/53a052910733072b31ecc97bc1ff18afbc68bf6e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org