You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by yanboliang <gi...@git.apache.org> on 2017/01/09 13:52:59 UTC

[GitHub] spark pull request #16516: [SPARK-19133][ML] ML GLR family and link could be...

GitHub user yanboliang opened a pull request:

    https://github.com/apache/spark/pull/16516

    [SPARK-19133][ML] ML GLR family and link could be uppercase.

    ## What changes were proposed in this pull request?
    MLlib ```GeneralizedLinearRegression``` could accept uppercase ```family``` and ```link```.
    
    ## How was this patch tested?
    Update corresponding tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yanboliang/spark spark-19133

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16516.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16516
    
----
commit f1337d8761b30412944891517b79214aa69567da
Author: Yanbo Liang <yb...@gmail.com>
Date:   2017-01-09T13:46:19Z

    ML GLR family and link could be uppercase.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16516: [SPARK-19155][ML] Make some string params of ML a...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16516#discussion_r95512181
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala ---
    @@ -365,7 +365,7 @@ class LogisticRegression @Since("1.2.0") (
           case None => histogram.length
         }
     
    -    val isMultinomial = $(family) match {
    +    val isMultinomial = $(family).toLowerCase match {
    --- End diff --
    
    is there a way to store the param as the lowered case version, instead of turning it into lower case when accessed? it might be less error prone that way?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16516: [SPARK-19155][ML] Make some string params of ML a...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16516#discussion_r95745585
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala ---
    @@ -91,8 +91,8 @@ private[classification] trait LogisticRegressionParams extends ProbabilisticClas
       @Since("2.1.0")
       final val family: Param[String] = new Param(this, "family",
         "The name of family which is a description of the label distribution to be used in the " +
    -      s"model. Supported options: ${supportedFamilyNames.mkString(", ")}.",
    -    ParamValidators.inArray[String](supportedFamilyNames))
    +      s"model (case-insensitive). Supported options: ${supportedFamilyNames.mkString(", ")}.",
    --- End diff --
    
    @imatiach-msft I think we should not to change the behavior of ```ParamValidators.inArray[String]```, since some other string params may ```case-sensitive``` which use the original check.
    Adding a new method sounds reasonable, but I'm a bit worried that whether we should add a so concrete method in the common validation object ```ParamValidators``` which use generic type. I'm still open to this topic and would like to hear more thoughts. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16516: [SPARK-19155][ML] Make some string params of ML algorith...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/16516
  
    @imatiach-msft I think not all string params should be case-insensitive, such as:
    * All column name params should not case-insensitive, like ```inputCol```.
    * Param names which were composed by multiple words, like ```areaUnderROC```.
    
    Please see the PR description.
    
    And for lots of other string params that you searched out, like ```impurity```, are already case-insensitive.
    Other string params, like ```SQLTransformer.statement```, are not need to be updated, since they are not set with a string word. The backend Spark SQL engine will handle all kinds of SQL statements.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16516: [SPARK-19133][ML] ML GLR family and link could be upperc...

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/16516
  
    Maybe a more generic fix would be to fix the method ParamValidators.inArray to be case insensitive.  I see this method used in a lot of places.  Doing a simple search brings up not just LogisticRegression.scala but also:
    /mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala
    /mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala
    /mllib/src/main/scala/org/apache/spark/ml/evaluation/RegressionEvaluator.scala
    /mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala
    /mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala
    and many others as well, and it looks like they all suffer from the same bug.  A more general fix would be preferred I think, especially to make all code consistent and use the same method, no?  It doesn't seem like any parameter should be case-sensitive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16516: [SPARK-19155][ML] MLlib GeneralizedLinearRegression fami...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/16516
  
    @imatiach-msft Yeah, that line was also duplicated in some other estimators. I don't think we have a good way to add it to the base class ```Param```, since it's abstract and not bound to specific type. Do you have some better suggestion? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16516: [SPARK-19155][ML] Make some string params of ML a...

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16516#discussion_r95846332
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala ---
    @@ -91,8 +91,8 @@ private[classification] trait LogisticRegressionParams extends ProbabilisticClas
       @Since("2.1.0")
       final val family: Param[String] = new Param(this, "family",
         "The name of family which is a description of the label distribution to be used in the " +
    -      s"model. Supported options: ${supportedFamilyNames.mkString(", ")}.",
    -    ParamValidators.inArray[String](supportedFamilyNames))
    +      s"model (case-insensitive). Supported options: ${supportedFamilyNames.mkString(", ")}.",
    --- End diff --
    
    you're right, I searched through the code base and case-sensitivity matters when:
    1.) we are specifying some column name as a parameter
    2.) RModel formula (from RFormula.scala)
    3.) Tokenizer.scala regex pattern
    In all other cases it doesn't seem like it should matter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16516: [SPARK-19155][ML] Make some string params of ML algorith...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16516
  
    **[Test build #71722 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71722/testReport)** for PR 16516 at commit [`f1f4c89`](https://github.com/apache/spark/commit/f1f4c8994a73d5c92ce2dbfb571129bb13d61213).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16516: [SPARK-19133][ML] ML GLR family and link could be upperc...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/16516
  
    cc @felixcheung 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16516: [SPARK-19155][ML] MLlib GeneralizedLinearRegression fami...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/16516
  
    looks good to me


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16516: [SPARK-19155][ML] Make some string params of ML a...

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16516#discussion_r95848454
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala ---
    @@ -91,8 +91,8 @@ private[classification] trait LogisticRegressionParams extends ProbabilisticClas
       @Since("2.1.0")
       final val family: Param[String] = new Param(this, "family",
         "The name of family which is a description of the label distribution to be used in the " +
    -      s"model. Supported options: ${supportedFamilyNames.mkString(", ")}.",
    -    ParamValidators.inArray[String](supportedFamilyNames))
    +      s"model (case-insensitive). Supported options: ${supportedFamilyNames.mkString(", ")}.",
    --- End diff --
    
    maybe we can add an additional string param validators class then to the same params.scala file in ml folder?  There should be a generic function and the params.scala file seems to be the right place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16516: [SPARK-19155][ML] Make some string params of ML algorith...

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/16516
  
    yep, I wrote that in a comment above, I totally agree:
    1.) we are specifying some column name as a parameter
    2.) RModel formula (from RFormula.scala)
    3.) Tokenizer.scala regex pattern
    for AUC I don't think it should matter though, but it's not too significant.
    I still think for the check we should have one method instead of duplicating code, and same for accessing the value (instead of calling .toLower everywhere in the transform's/estimator's code).
    I believe anywhere where there is duplicate code there is room for refactoring.  Otherwise, the changes look good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16516: [SPARK-19155][ML] Make some string params of ML a...

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16516#discussion_r95679101
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala ---
    @@ -365,7 +365,7 @@ class LogisticRegression @Since("1.2.0") (
           case None => histogram.length
         }
     
    -    val isMultinomial = $(family) match {
    +    val isMultinomial = $(family).toLowerCase match {
    --- End diff --
    
    maybe we need to have a different accessor that is consistently used on the transform/estimator side internally to:
    1.) change the value to lowercase 2.) trim any whitespace
    Changing the setter might cause issues because then when users try to validate that their parameters are set correctly they will see that they are modified, which is unexpected.  The case-insensitive compare should be done as in this PR, but instead of calling toLowerCase everywhere explicitly we should be accessing using some other method that normalizes the parameter internally


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16516: [SPARK-19133][ML] ML GLR family and link could be upperc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16516
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71082/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16516: [SPARK-19155][ML] Make some string params of ML algorith...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16516
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16516: [SPARK-19155][ML] Make some string params of ML a...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16516#discussion_r95858814
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala ---
    @@ -365,7 +365,7 @@ class LogisticRegression @Since("1.2.0") (
           case None => histogram.length
         }
     
    -    val isMultinomial = $(family) match {
    +    val isMultinomial = $(family).toLowerCase match {
    --- End diff --
    
    @yanboliang is correct that there are other entrance points for setting and getting Params.  I agree it'd be nice to consolidate them, but that would be quite a bit of work and lower priority than other tech debt we currently have, IMO.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16516: [SPARK-19133][ML] ML GLR family and link could be upperc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16516
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16516: [SPARK-19155][ML] MLlib GeneralizedLinearRegression fami...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16516
  
    **[Test build #71722 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71722/testReport)** for PR 16516 at commit [`f1f4c89`](https://github.com/apache/spark/commit/f1f4c8994a73d5c92ce2dbfb571129bb13d61213).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16516: [SPARK-19155][ML] Make some string params of ML algorith...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16516
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71178/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16516: [SPARK-19155][ML] Make some string params of ML algorith...

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/16516
  
    It looks like you can also update the metric name in the evaluators (binary, regression, multiclass) as well.  Those should be case-insensitive too, I think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16516: [SPARK-19155][ML] Make some string params of ML a...

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16516#discussion_r95679245
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala ---
    @@ -91,8 +91,8 @@ private[classification] trait LogisticRegressionParams extends ProbabilisticClas
       @Since("2.1.0")
       final val family: Param[String] = new Param(this, "family",
         "The name of family which is a description of the label distribution to be used in the " +
    -      s"model. Supported options: ${supportedFamilyNames.mkString(", ")}.",
    -    ParamValidators.inArray[String](supportedFamilyNames))
    +      s"model (case-insensitive). Supported options: ${supportedFamilyNames.mkString(", ")}.",
    --- End diff --
    
    Is it possible to change the ParamValidators.inArray[String] method to verify the given string in a case-insensitive way? Then you wouldn't need to make as many changes. (eg this change could be reverted)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16516: [SPARK-19155][ML] Make some string params of ML algorith...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16516
  
    **[Test build #71178 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71178/testReport)** for PR 16516 at commit [`de6994c`](https://github.com/apache/spark/commit/de6994c1927ada5fd722650f3f5d27ec7fb25103).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16516: [SPARK-19155][ML] MLlib GeneralizedLinearRegression fami...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16516
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71722/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16516: [SPARK-19155][ML] Make some string params of ML a...

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16516#discussion_r95681819
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala ---
    @@ -91,8 +91,8 @@ private[classification] trait LogisticRegressionParams extends ProbabilisticClas
       @Since("2.1.0")
       final val family: Param[String] = new Param(this, "family",
         "The name of family which is a description of the label distribution to be used in the " +
    -      s"model. Supported options: ${supportedFamilyNames.mkString(", ")}.",
    -    ParamValidators.inArray[String](supportedFamilyNames))
    +      s"model (case-insensitive). Supported options: ${supportedFamilyNames.mkString(", ")}.",
    --- End diff --
    
    maybe you could add a ParamValidators.inStringArray(supportedFamilyNames)) method which would both normalize to lowercase and trim whitespace (?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16516: [WIP][SPARK-19155][ML] ML GLR family and link could be u...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/16516
  
    @imatiach-msft @felixcheung Sounds good, I opened [SPARK-19155](https://issues.apache.org/jira/browse/SPARK-19155) to track and will update this PR soon. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16516: [SPARK-19133][ML] ML GLR family and link could be upperc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16516
  
    **[Test build #71082 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71082/testReport)** for PR 16516 at commit [`f1337d8`](https://github.com/apache/spark/commit/f1337d8761b30412944891517b79214aa69567da).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16516: [SPARK-19133][ML] ML GLR family and link could be upperc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16516
  
    **[Test build #71082 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71082/testReport)** for PR 16516 at commit [`f1337d8`](https://github.com/apache/spark/commit/f1337d8761b30412944891517b79214aa69567da).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16516: [SPARK-19155][ML] MLlib GeneralizedLinearRegressi...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16516


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16516: [SPARK-19155][ML] Make some string params of ML a...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16516#discussion_r95592357
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala ---
    @@ -365,7 +365,7 @@ class LogisticRegression @Since("1.2.0") (
           case None => histogram.length
         }
     
    -    val isMultinomial = $(family) match {
    +    val isMultinomial = $(family).toLowerCase match {
    --- End diff --
    
    I don't think we can do that in ```setXXX``` methods, since they are not the only entrance to set params, we can also use the following API to set value for params:
    ```
    def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): M = {
        val map = new ParamMap()
          .put(firstParamPair)
          .put(otherParamPairs: _*)
        fit(dataset, map)
      }
    ``` 
    
    cc @jkbradley @sethah 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16516: [SPARK-19155][ML] MLlib GeneralizedLinearRegression fami...

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/16516
  
    Hmm ok, I guess that's fine.  I'm just worried this line is duplicated, maybe you could add a method for it and put it in a common place:
    (value: String) => supportedFamilyNames.contains(value.toLowerCase)) 
    
    Otherwise the code looks great to me!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16516: [SPARK-19155][ML] MLlib GeneralizedLinearRegression fami...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16516
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16516: [SPARK-19155][ML] Make some string params of ML a...

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16516#discussion_r95542552
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala ---
    @@ -365,7 +365,7 @@ class LogisticRegression @Since("1.2.0") (
           case None => histogram.length
         }
     
    -    val isMultinomial = $(family) match {
    +    val isMultinomial = $(family).toLowerCase match {
    --- End diff --
    
    It can, but I think it would need to be done in the concrete `setXXX` method each time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16516: [SPARK-19155][ML] Make some string params of ML a...

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16516#discussion_r95846429
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala ---
    @@ -91,8 +91,8 @@ private[classification] trait LogisticRegressionParams extends ProbabilisticClas
       @Since("2.1.0")
       final val family: Param[String] = new Param(this, "family",
         "The name of family which is a description of the label distribution to be used in the " +
    -      s"model. Supported options: ${supportedFamilyNames.mkString(", ")}.",
    -    ParamValidators.inArray[String](supportedFamilyNames))
    +      s"model (case-insensitive). Supported options: ${supportedFamilyNames.mkString(", ")}.",
    --- End diff --
    
    Searching through the code base these are the places where we use Param[String]:
    
    spark-mllib_2.11
    org.apache.spark.ml.classification
    LogisticRegression.scala
      final val family: Param[String] = new Param(this, "family",
    MultilayerPerceptronClassifier.scala
      final val solver: Param[String] = new Param[String](this, "solver",
      final val solver: Param[String] = new Param[String](this, "solver",
    NaiveBayes.scala
      final val modelType: Param[String] = new Param[String](this, "modelType", "The model type " +
      final val modelType: Param[String] = new Param[String](this, "modelType", "The model type " +
    org.apache.spark.ml.clustering
    KMeans.scala
      final val initMode = new Param[String](this, "initMode", "The initialization algorithm. " +
    LDA.scala
      final val optimizer = new Param[String](this, "optimizer", "Optimizer or inference" +
      final val topicDistributionCol = new Param[String](this, "topicDistributionCol", "Output column" +
    org.apache.spark.ml.evaluation
    BinaryClassificationEvaluator.scala
      val metricName: Param[String] = {
    MulticlassClassificationEvaluator.scala
      val metricName: Param[String] = {
    RegressionEvaluator.scala
      val metricName: Param[String] = {
    org.apache.spark.ml.feature
    Bucketizer.scala
      val handleInvalid: Param[String] = new Param[String](this, "handleInvalid", "how to handle " +
      val handleInvalid: Param[String] = new Param[String](this, "handleInvalid", "how to handle " +
    ChiSqSelector.scala
      final val selectorType = new Param[String](this, "selectorType",
    QuantileDiscretizer.scala
      val handleInvalid: Param[String] = new Param[String](this, "handleInvalid", "how to handle " +
      val handleInvalid: Param[String] = new Param[String](this, "handleInvalid", "how to handle " +
    RFormula.scala
      val formula: Param[String] = new Param(this, "formula", "R model formula")
    SQLTransformer.scala
      final val statement: Param[String] = new Param[String](this, "statement", "SQL statement")
      final val statement: Param[String] = new Param[String](this, "statement", "SQL statement")
    Tokenizer.scala
      val pattern: Param[String] = new Param(this, "pattern", "regex pattern used for tokenizing")
    org.apache.spark.ml.param
    ParamsSuite.scala
          val param = new Param[String](dummy, "name", "doc")
    org.apache.spark.ml.param.shared
    sharedParams.scala
      final val featuresCol: Param[String] = new Param[String](this, "featuresCol", "features column name")
      final val featuresCol: Param[String] = new Param[String](this, "featuresCol", "features column name")
      final val labelCol: Param[String] = new Param[String](this, "labelCol", "label column name")
      final val labelCol: Param[String] = new Param[String](this, "labelCol", "label column name")
      final val predictionCol: Param[String] = new Param[String](this, "predictionCol", "prediction column name")
      final val predictionCol: Param[String] = new Param[String](this, "predictionCol", "prediction column name")
      final val rawPredictionCol: Param[String] = new Param[String](this, "rawPredictionCol", "raw prediction (a.k.a. confidence) column name")
      final val rawPredictionCol: Param[String] = new Param[String](this, "rawPredictionCol", "raw prediction (a.k.a. confidence) column name")
    ... P...
    ... P...
      final val varianceCol: Param[String] = new Param[String](this, "varianceCol", "Column name for the biased sample variance of prediction")
      final val varianceCol: Param[String] = new Param[String](this, "varianceCol", "Column name for the biased sample variance of prediction")
      final val inputCol: Param[String] = new Param[String](this, "inputCol", "input column name")
      final val inputCol: Param[String] = new Param[String](this, "inputCol", "input column name")
      final val outputCol: Param[String] = new Param[String](this, "outputCol", "output column name")
      final val outputCol: Param[String] = new Param[String](this, "outputCol", "output column name")
    ... P...
    ... P...
      final val weightCol: Param[String] = new Param[String](this, "weightCol", "weight column name. If this is not set or empty, we treat all instance weights as 1.0")
      final val weightCol: Param[String] = new Param[String](this, "weightCol", "weight column name. If this is not set or empty, we treat all instance weights as 1.0")
      final val solver: Param[String] = new Param[String](this, "solver", "the solver algorithm for optimization. If this is not set or empty, default value is 'auto'")
      final val solver: Param[String] = new Param[String](this, "solver", "the solver algorithm for optimization. If this is not set or empty, default value is 'auto'")
    org.apache.spark.ml.recommendation
    ALS.scala
      val userCol = new Param[String](this, "userCol", "column name for user ids. Ids must be within " +
      val itemCol = new Param[String](this, "itemCol", "column name for item ids. Ids must be within " +
      val ratingCol = new Param[String](this, "ratingCol", "column name for ratings")
      val intermediateStorageLevel = new Param[String](this, "intermediateStorageLevel",
      val finalStorageLevel = new Param[String](this, "finalStorageLevel",
    org.apache.spark.ml.regression
    AFTSurvivalRegression.scala
      final val censorCol: Param[String] = new Param(this, "censorCol", "censor column name")
      final val quantilesCol: Param[String] = new Param(this, "quantilesCol", "quantiles column name")
    GeneralizedLinearRegression.scala
      final val family: Param[String] = new Param(this, "family",
      final val link: Param[String] = new Param(this, "link", "The name of link function " +
      final val linkPredictionCol: Param[String] = new Param[String](this, "linkPredictionCol",
      final val linkPredictionCol: Param[String] = new Param[String](this, "linkPredictionCol",
    org.apache.spark.ml.tree
    treeParams.scala
      final val impurity: Param[String] = new Param[String](this, "impurity", "Criterion used for" +
      final val impurity: Param[String] = new Param[String](this, "impurity", "Criterion used for" +
      final val impurity: Param[String] = new Param[String](this, "impurity", "Criterion used for" +
      final val impurity: Param[String] = new Param[String](this, "impurity", "Criterion used for" +
      final val featureSubsetStrategy: Param[String] = new Param[String](this, "featureSubsetStrategy",
      final val featureSubsetStrategy: Param[String] = new Param[String](this, "featureSubsetStrategy",
      val lossType: Param[String] = new Param[String](this, "lossType", "Loss function which GBT" +
      val lossType: Param[String] = new Param[String](this, "lossType", "Loss function which GBT" +
      val lossType: Param[String] = new Param[String](this, "lossType", "Loss function which GBT" +
      val lossType: Param[String] = new Param[String](this, "lossType", "Loss function which GBT" +
    org.apache.spark.ml.util
    DefaultReadWriteTest.scala
      final val stringParam: Param[String] = new Param[String](this, "stringParam", "doc")
      final val stringParam: Param[String] = new Param[String](this, "stringParam", "doc")


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16516: [SPARK-19155][ML] MLlib GeneralizedLinearRegression fami...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/16516
  
    Merged into master, branch-2.1 and branch-2.0. Thanks for all your reviewing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16516: [SPARK-19155][ML] Make some string params of ML algorith...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/16516
  
    I found this involves lots of problems which need further defined and refactor some code, so I will narrow the scope of this PR to only make ```GeneralizedLinearRegression family``` and ```link``` case insensitive, since it's a bug that ```GLM``` should support ```Gamma``` family.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16516: [SPARK-19155][ML] Make some string params of ML algorith...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16516
  
    **[Test build #71178 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71178/testReport)** for PR 16516 at commit [`de6994c`](https://github.com/apache/spark/commit/de6994c1927ada5fd722650f3f5d27ec7fb25103).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16516: [SPARK-19133][ML] ML GLR family and link could be upperc...

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/16516
  
    This is a nice fix.  It looks like some other learners have this issue as well, eg LogisticRegression.scala under $(root)/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16516: [SPARK-19133][ML] ML GLR family and link could be upperc...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/16516
  
    I'd agree with that. Given that wider scope of changes I'd suggest creating another JIRA to make it clear the scope & impact - it wouldn't be just affecting SparkR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org