You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by yanboliang <gi...@git.apache.org> on 2017/03/03 04:37:19 UTC

[GitHub] spark pull request #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLin...

GitHub user yanboliang opened a pull request:

    https://github.com/apache/spark/pull/17146

    [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegression supports tweedie distribution.

    ## What changes were proposed in this pull request?
    PySpark ```GeneralizedLinearRegression``` supports tweedie distribution.
    
    ## How was this patch tested?
    Add unit tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yanboliang/spark spark-19806

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17146.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17146
    
----
commit fcb5cfb10d0a40f0556d8ac8ab415f4e66ab62be
Author: Yanbo Liang <yb...@gmail.com>
Date:   2017-03-03T04:36:02Z

    PySpark GeneralizedLinearRegression supports tweedie distribution.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLin...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17146#discussion_r104853968
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1223,6 +1223,26 @@ def test_apply_binary_term_freqs(self):
                                        ": expected " + str(expected[i]) + ", got " + str(features[i]))
     
     
    +class GeneralizedLinearRegressionTest(SparkSessionTestCase):
    +
    +    def test_tweedie_distribution(self):
    +
    +        df = self.spark.createDataFrame(
    +            [(1.0, Vectors.dense(0.0, 0.0)),
    +             (1.0, Vectors.dense(1.0, 2.0)),
    +             (2.0, Vectors.dense(0.0, 0.0)),
    +             (2.0, Vectors.dense(1.0, 1.0)), ], ["label", "features"])
    +
    +        glr = GeneralizedLinearRegression(family="tweedie", variancePower=1.6)
    +        model = glr.fit(df)
    +        self.assertTrue(np.allclose(model.coefficients.toArray(), [-0.4645, 0.3402], atol=1E-4))
    --- End diff --
    
    I'm curious: where did the expected values come from? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    **[Test build #73810 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73810/testReport)** for PR 17146 at commit [`fcb5cfb`](https://github.com/apache/spark/commit/fcb5cfb10d0a40f0556d8ac8ab415f4e66ab62be).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLin...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17146#discussion_r104428117
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -1305,6 +1305,9 @@ class GeneralizedLinearRegression(JavaEstimator, HasLabelCol, HasFeaturesCol, Ha
     
         * "gamma"    -> "inverse", "identity", "log"
     
    +    * "tweedie"  -> power link function specified through "linkPower". \
    +                    The default link power in the tweedie family is 1 - variancePower.
    --- End diff --
    
    It will produce a model according to the specified ```variancePower``` and ```linkPower```. The doc here is to explain the value of ```linkPower``` if users don't specify.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73812/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    **[Test build #74014 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74014/testReport)** for PR 17146 at commit [`fe1d3ae`](https://github.com/apache/spark/commit/fe1d3ae36314e385990f024bca94ab1e416476f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLin...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17146#discussion_r104322600
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -1344,40 +1347,53 @@ class GeneralizedLinearRegression(JavaEstimator, HasLabelCol, HasFeaturesCol, Ha
     
         family = Param(Params._dummy(), "family", "The name of family which is a description of " +
                        "the error distribution to be used in the model. Supported options: " +
    -                   "gaussian (default), binomial, poisson and gamma.",
    +                   "gaussian (default), binomial, poisson, gamma and tweedie.",
                        typeConverter=TypeConverters.toString)
         link = Param(Params._dummy(), "link", "The name of link function which provides the " +
                      "relationship between the linear predictor and the mean of the distribution " +
                      "function. Supported options: identity, log, inverse, logit, probit, cloglog " +
                      "and sqrt.", typeConverter=TypeConverters.toString)
         linkPredictionCol = Param(Params._dummy(), "linkPredictionCol", "link prediction (linear " +
                                   "predictor) column name", typeConverter=TypeConverters.toString)
    +    variancePower = Param(Params._dummy(), "variancePower", "The power in the variance function " +
    +                          "of the Tweedie distribution which characterizes the relationship " +
    +                          "between the variance and mean of the distribution. Only applicable " +
    +                          "for the Tweedie family. Supported values: 0 and [1, Inf).",
    +                          typeConverter=TypeConverters.toFloat)
    +    linkPower = Param(Params._dummy(), "linkPower", "The index in the power link function. " +
    +                      "Only applicable for the Tweedie family.",
    --- End diff --
    
    nit: I think it should say `applicable to`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLin...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17146#discussion_r104322692
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -1344,40 +1347,53 @@ class GeneralizedLinearRegression(JavaEstimator, HasLabelCol, HasFeaturesCol, Ha
     
         family = Param(Params._dummy(), "family", "The name of family which is a description of " +
                        "the error distribution to be used in the model. Supported options: " +
    -                   "gaussian (default), binomial, poisson and gamma.",
    +                   "gaussian (default), binomial, poisson, gamma and tweedie.",
                        typeConverter=TypeConverters.toString)
         link = Param(Params._dummy(), "link", "The name of link function which provides the " +
                      "relationship between the linear predictor and the mean of the distribution " +
                      "function. Supported options: identity, log, inverse, logit, probit, cloglog " +
                      "and sqrt.", typeConverter=TypeConverters.toString)
         linkPredictionCol = Param(Params._dummy(), "linkPredictionCol", "link prediction (linear " +
                                   "predictor) column name", typeConverter=TypeConverters.toString)
    +    variancePower = Param(Params._dummy(), "variancePower", "The power in the variance function " +
    +                          "of the Tweedie distribution which characterizes the relationship " +
    +                          "between the variance and mean of the distribution. Only applicable " +
    +                          "for the Tweedie family. Supported values: 0 and [1, Inf).",
    +                          typeConverter=TypeConverters.toFloat)
    +    linkPower = Param(Params._dummy(), "linkPower", "The index in the power link function. " +
    +                      "Only applicable for the Tweedie family.",
    +                      typeConverter=TypeConverters.toFloat)
     
         @keyword_only
         def __init__(self, labelCol="label", featuresCol="features", predictionCol="prediction",
                      family="gaussian", link=None, fitIntercept=True, maxIter=25, tol=1e-6,
    --- End diff --
    
    is there check to make sure link=None when family="Tweedie"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLin...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17146#discussion_r104676196
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -1344,40 +1347,53 @@ class GeneralizedLinearRegression(JavaEstimator, HasLabelCol, HasFeaturesCol, Ha
     
         family = Param(Params._dummy(), "family", "The name of family which is a description of " +
                        "the error distribution to be used in the model. Supported options: " +
    -                   "gaussian (default), binomial, poisson and gamma.",
    +                   "gaussian (default), binomial, poisson, gamma and tweedie.",
                        typeConverter=TypeConverters.toString)
         link = Param(Params._dummy(), "link", "The name of link function which provides the " +
                      "relationship between the linear predictor and the mean of the distribution " +
                      "function. Supported options: identity, log, inverse, logit, probit, cloglog " +
                      "and sqrt.", typeConverter=TypeConverters.toString)
         linkPredictionCol = Param(Params._dummy(), "linkPredictionCol", "link prediction (linear " +
                                   "predictor) column name", typeConverter=TypeConverters.toString)
    +    variancePower = Param(Params._dummy(), "variancePower", "The power in the variance function " +
    +                          "of the Tweedie distribution which characterizes the relationship " +
    +                          "between the variance and mean of the distribution. Only applicable " +
    +                          "for the Tweedie family. Supported values: 0 and [1, Inf).",
    +                          typeConverter=TypeConverters.toFloat)
    +    linkPower = Param(Params._dummy(), "linkPower", "The index in the power link function. " +
    +                      "Only applicable for the Tweedie family.",
    +                      typeConverter=TypeConverters.toFloat)
     
         @keyword_only
         def __init__(self, labelCol="label", featuresCol="features", predictionCol="prediction",
                      family="gaussian", link=None, fitIntercept=True, maxIter=25, tol=1e-6,
    --- End diff --
    
    Yeah, there is no default value for ```link```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    **[Test build #73811 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73811/testReport)** for PR 17146 at commit [`99cbe35`](https://github.com/apache/spark/commit/99cbe3584dc2916625ca2224b1c155a47ac10c51).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    **[Test build #73811 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73811/testReport)** for PR 17146 at commit [`99cbe35`](https://github.com/apache/spark/commit/99cbe3584dc2916625ca2224b1c155a47ac10c51).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    **[Test build #74013 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74013/testReport)** for PR 17146 at commit [`eef5666`](https://github.com/apache/spark/commit/eef56663fdc501429bb33662984bc5ec1e9e82a9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    **[Test build #74013 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74013/testReport)** for PR 17146 at commit [`eef5666`](https://github.com/apache/spark/commit/eef56663fdc501429bb33662984bc5ec1e9e82a9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLin...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17146#discussion_r104604254
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -1344,40 +1347,53 @@ class GeneralizedLinearRegression(JavaEstimator, HasLabelCol, HasFeaturesCol, Ha
     
         family = Param(Params._dummy(), "family", "The name of family which is a description of " +
                        "the error distribution to be used in the model. Supported options: " +
    -                   "gaussian (default), binomial, poisson and gamma.",
    +                   "gaussian (default), binomial, poisson, gamma and tweedie.",
                        typeConverter=TypeConverters.toString)
         link = Param(Params._dummy(), "link", "The name of link function which provides the " +
                      "relationship between the linear predictor and the mean of the distribution " +
                      "function. Supported options: identity, log, inverse, logit, probit, cloglog " +
                      "and sqrt.", typeConverter=TypeConverters.toString)
         linkPredictionCol = Param(Params._dummy(), "linkPredictionCol", "link prediction (linear " +
                                   "predictor) column name", typeConverter=TypeConverters.toString)
    +    variancePower = Param(Params._dummy(), "variancePower", "The power in the variance function " +
    +                          "of the Tweedie distribution which characterizes the relationship " +
    +                          "between the variance and mean of the distribution. Only applicable " +
    +                          "for the Tweedie family. Supported values: 0 and [1, Inf).",
    +                          typeConverter=TypeConverters.toFloat)
    +    linkPower = Param(Params._dummy(), "linkPower", "The index in the power link function. " +
    +                      "Only applicable for the Tweedie family.",
    +                      typeConverter=TypeConverters.toFloat)
     
         @keyword_only
         def __init__(self, labelCol="label", featuresCol="features", predictionCol="prediction",
                      family="gaussian", link=None, fitIntercept=True, maxIter=25, tol=1e-6,
    --- End diff --
    
    hmm, it sounds like `link` should really be `None` then


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73810/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74013/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    cc @jkbradley @actuaryzhang @MLnick @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    **[Test build #73812 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73812/testReport)** for PR 17146 at commit [`f414390`](https://github.com/apache/spark/commit/f4143905090c391bc8b67f50ba7f498de1daadbb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    Will take a look tonight. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    @Antoinelypro Sorry for late response. Actually we have default value if users don't set _link_ explicitly. Could you show the detail of your error case? Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    Merged into master. Thanks for reviewing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLin...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17146#discussion_r104432301
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -1344,40 +1347,53 @@ class GeneralizedLinearRegression(JavaEstimator, HasLabelCol, HasFeaturesCol, Ha
     
         family = Param(Params._dummy(), "family", "The name of family which is a description of " +
                        "the error distribution to be used in the model. Supported options: " +
    -                   "gaussian (default), binomial, poisson and gamma.",
    +                   "gaussian (default), binomial, poisson, gamma and tweedie.",
                        typeConverter=TypeConverters.toString)
         link = Param(Params._dummy(), "link", "The name of link function which provides the " +
                      "relationship between the linear predictor and the mean of the distribution " +
                      "function. Supported options: identity, log, inverse, logit, probit, cloglog " +
                      "and sqrt.", typeConverter=TypeConverters.toString)
         linkPredictionCol = Param(Params._dummy(), "linkPredictionCol", "link prediction (linear " +
                                   "predictor) column name", typeConverter=TypeConverters.toString)
    +    variancePower = Param(Params._dummy(), "variancePower", "The power in the variance function " +
    +                          "of the Tweedie distribution which characterizes the relationship " +
    +                          "between the variance and mean of the distribution. Only applicable " +
    +                          "for the Tweedie family. Supported values: 0 and [1, Inf).",
    +                          typeConverter=TypeConverters.toFloat)
    +    linkPower = Param(Params._dummy(), "linkPower", "The index in the power link function. " +
    +                      "Only applicable for the Tweedie family.",
    --- End diff --
    
    Updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    @actuaryzhang would you take a look at this one. If recall, it's one option we considered for R API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    **[Test build #74014 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74014/testReport)** for PR 17146 at commit [`fe1d3ae`](https://github.com/apache/spark/commit/fe1d3ae36314e385990f024bca94ab1e416476f2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74014/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLin...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17146#discussion_r104322642
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -1305,6 +1305,9 @@ class GeneralizedLinearRegression(JavaEstimator, HasLabelCol, HasFeaturesCol, Ha
     
         * "gamma"    -> "inverse", "identity", "log"
     
    +    * "tweedie"  -> power link function specified through "linkPower". \
    +                    The default link power in the tweedie family is 1 - variancePower.
    --- End diff --
    
    what happens when both variancePower ad linkPower is set?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    **[Test build #73810 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73810/testReport)** for PR 17146 at commit [`fcb5cfb`](https://github.com/apache/spark/commit/fcb5cfb10d0a40f0556d8ac8ab415f4e66ab62be).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLin...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17146


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    **[Test build #73812 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73812/testReport)** for PR 17146 at commit [`f414390`](https://github.com/apache/spark/commit/f4143905090c391bc8b67f50ba7f498de1daadbb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLin...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17146#discussion_r104429240
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -1344,40 +1347,53 @@ class GeneralizedLinearRegression(JavaEstimator, HasLabelCol, HasFeaturesCol, Ha
     
         family = Param(Params._dummy(), "family", "The name of family which is a description of " +
                        "the error distribution to be used in the model. Supported options: " +
    -                   "gaussian (default), binomial, poisson and gamma.",
    +                   "gaussian (default), binomial, poisson, gamma and tweedie.",
                        typeConverter=TypeConverters.toString)
         link = Param(Params._dummy(), "link", "The name of link function which provides the " +
                      "relationship between the linear predictor and the mean of the distribution " +
                      "function. Supported options: identity, log, inverse, logit, probit, cloglog " +
                      "and sqrt.", typeConverter=TypeConverters.toString)
         linkPredictionCol = Param(Params._dummy(), "linkPredictionCol", "link prediction (linear " +
                                   "predictor) column name", typeConverter=TypeConverters.toString)
    +    variancePower = Param(Params._dummy(), "variancePower", "The power in the variance function " +
    +                          "of the Tweedie distribution which characterizes the relationship " +
    +                          "between the variance and mean of the distribution. Only applicable " +
    +                          "for the Tweedie family. Supported values: 0 and [1, Inf).",
    +                          typeConverter=TypeConverters.toFloat)
    +    linkPower = Param(Params._dummy(), "linkPower", "The index in the power link function. " +
    +                      "Only applicable for the Tweedie family.",
    +                      typeConverter=TypeConverters.toFloat)
     
         @keyword_only
         def __init__(self, labelCol="label", featuresCol="features", predictionCol="prediction",
                      family="gaussian", link=None, fitIntercept=True, maxIter=25, tol=1e-6,
    --- End diff --
    
    No, actually we allow users to set ```link``` even if the family is ```tweedie```, we can't disable the set function. However, in this case, any ```link``` value will be ignored, and we will print warning log to tell users ```link``` will take no effect.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    This looks good to me. Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17146: [SPARK-19806][ML][PySpark] PySpark GeneralizedLinearRegr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17146
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73811/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org