You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sethah <gi...@git.apache.org> on 2016/05/06 18:23:34 UTC

[GitHub] spark pull request: [SPARK-15181][ML][PYTHON] Python API for GLR s...

GitHub user sethah opened a pull request:

    https://github.com/apache/spark/pull/12961

    [SPARK-15181][ML][PYTHON] Python API for GLR summaries.

    ## What changes were proposed in this pull request?
    
    This patch adds a python API for generalized linear regression summaries (training and test). This helps provide feature parity for Python GLMs.
    
    ## How was this patch tested?
    
    Added a unit test to `pyspark.ml.tests`
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sethah/spark GLR_summary

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12961.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12961
    
----
commit 440c0d1edd3be410a24880cfabe96e40a28fea87
Author: sethah <se...@gmail.com>
Date:   2016-05-06T18:06:49Z

    adding python API for glr summary

commit 17bf04b3b9d42c20e3c0e3360c07262169b143ee
Author: sethah <se...@gmail.com>
Date:   2016-05-06T18:17:22Z

    typo and link

commit 4c1f4e15ba1ca7ba1c0d46e32a33f5ff89ec6000
Author: sethah <se...@gmail.com>
Date:   2016-05-06T18:20:54Z

    fix test

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-218024084
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12961#discussion_r62744604
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -905,6 +906,43 @@ def test_linear_regression_summary(self):
             sameSummary = model.evaluate(df)
             self.assertAlmostEqual(sameSummary.explainedVariance, s.explainedVariance)
     
    +    def test_linear_regression_summary(self):
    --- End diff --
    
    Yep, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-218838552
  
    **[Test build #58507 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58507/consoleFull)** for PR 12961 at commit [`fa252f2`](https://github.com/apache/spark/commit/fa252f265ac745b31c53894fddbf037569236934).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-218290102
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-218017545
  
    **[Test build #58180 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58180/consoleFull)** for PR 12961 at commit [`10a252c`](https://github.com/apache/spark/commit/10a252cb590657cc51e40c08606e17c3e0c8935c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12961#discussion_r62744558
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -1337,6 +1338,204 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, deviance, pValues) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary is None`.
    +        """
    +        java_lrt_summary = self._call_java("summary")
    +        return GeneralizedLinearRegressionTrainingSummary(java_lrt_summary)
    +
    +    @property
    +    @since("2.0.0")
    +    def hasSummary(self):
    +        """
    +        Indicates whether a training summary exists for this model
    +        instance.
    +        """
    +        return self._call_java("hasSummary")
    +
    +    @since("2.0.0")
    +    def evaluate(self, dataset):
    +        """
    +        Evaluates the model on a test dataset.
    +
    +        :param dataset:
    +          Test dataset to evaluate model on, where dataset is an
    +          instance of :py:class:`pyspark.sql.DataFrame`
    +        """
    +        if not isinstance(dataset, DataFrame):
    +            raise ValueError("dataset must be a DataFrame but got %s." % type(dataset))
    +        java_glr_summary = self._call_java("evaluate", dataset)
    +        return GeneralizedLinearRegressionSummary(java_glr_summary)
    +
    +
    +class GeneralizedLinearRegressionSummary(JavaWrapper):
    +    """
    +    .. note:: Experimental
    +
    +    Generalized linear regression results evaluated on a dataset.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @property
    +    @since("2.0.0")
    +    def predictions(self):
    +        """
    +        Predictions output by the model's `transform` method.
    +        """
    +        return self._call_java("predictions")
    +
    +    @property
    +    @since("2.0.0")
    +    def predictionCol(self):
    --- End diff --
    
    Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12961


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12961#discussion_r62589231
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -1337,6 +1338,204 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, deviance, pValues) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary is None`.
    +        """
    +        java_lrt_summary = self._call_java("summary")
    +        return GeneralizedLinearRegressionTrainingSummary(java_lrt_summary)
    +
    +    @property
    +    @since("2.0.0")
    +    def hasSummary(self):
    +        """
    +        Indicates whether a training summary exists for this model
    +        instance.
    +        """
    +        return self._call_java("hasSummary")
    +
    +    @since("2.0.0")
    +    def evaluate(self, dataset):
    +        """
    +        Evaluates the model on a test dataset.
    +
    +        :param dataset:
    +          Test dataset to evaluate model on, where dataset is an
    +          instance of :py:class:`pyspark.sql.DataFrame`
    +        """
    +        if not isinstance(dataset, DataFrame):
    +            raise ValueError("dataset must be a DataFrame but got %s." % type(dataset))
    +        java_glr_summary = self._call_java("evaluate", dataset)
    +        return GeneralizedLinearRegressionSummary(java_glr_summary)
    +
    +
    +class GeneralizedLinearRegressionSummary(JavaWrapper):
    +    """
    +    .. note:: Experimental
    +
    +    Generalized linear regression results evaluated on a dataset.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @property
    +    @since("2.0.0")
    +    def predictions(self):
    +        """
    +        Predictions output by the model's `transform` method.
    +        """
    +        return self._call_java("predictions")
    +
    +    @property
    +    @since("2.0.0")
    +    def predictionCol(self):
    --- End diff --
    
    Done, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-218853911
  
    **[Test build #58509 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58509/consoleFull)** for PR 12961 at commit [`0f816d2`](https://github.com/apache/spark/commit/0f816d2ffa757609cf3c5c69b5757296a9c7078e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-218830441
  
    **[Test build #58507 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58507/consoleFull)** for PR 12961 at commit [`fa252f2`](https://github.com/apache/spark/commit/fa252f265ac745b31c53894fddbf037569236934).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12961#discussion_r62558081
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -1337,6 +1338,204 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, deviance, pValues) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary is None`.
    +        """
    +        java_lrt_summary = self._call_java("summary")
    +        return GeneralizedLinearRegressionTrainingSummary(java_lrt_summary)
    +
    +    @property
    +    @since("2.0.0")
    +    def hasSummary(self):
    +        """
    +        Indicates whether a training summary exists for this model
    +        instance.
    +        """
    +        return self._call_java("hasSummary")
    +
    +    @since("2.0.0")
    +    def evaluate(self, dataset):
    +        """
    +        Evaluates the model on a test dataset.
    +
    +        :param dataset:
    +          Test dataset to evaluate model on, where dataset is an
    +          instance of :py:class:`pyspark.sql.DataFrame`
    +        """
    +        if not isinstance(dataset, DataFrame):
    +            raise ValueError("dataset must be a DataFrame but got %s." % type(dataset))
    +        java_glr_summary = self._call_java("evaluate", dataset)
    +        return GeneralizedLinearRegressionSummary(java_glr_summary)
    +
    +
    +class GeneralizedLinearRegressionSummary(JavaWrapper):
    +    """
    +    .. note:: Experimental
    +
    +    Generalized linear regression results evaluated on a dataset.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @property
    +    @since("2.0.0")
    +    def predictions(self):
    +        """
    +        Predictions output by the model's `transform` method.
    +        """
    +        return self._call_java("predictions")
    +
    +    @property
    +    @since("2.0.0")
    +    def predictionCol(self):
    --- End diff --
    
    Maybe consider copying the Scaladoc here
    `Field in "predictions" which gives the prediction value of each instance. This is set to a new column name if the original model's predictionCol is not set.`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYTHON] Python API for GLR s...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-217521372
  
    cc @yanboliang @jkbradley This is part of Python API audit for Spark 2.0 QA. Could you take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-218386900
  
    Will leave open for a day in case @yanboliang or @jkbradley want to take a pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-218854029
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58509/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-218829447
  
    @yanboliang thanks for the review! I addressed your comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-218741200
  
    LGTM except the two minor issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-217532002
  
    **[Test build #58013 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58013/consoleFull)** for PR 12961 at commit [`4c1f4e1`](https://github.com/apache/spark/commit/4c1f4e15ba1ca7ba1c0d46e32a33f5ff89ec6000).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-218289930
  
    **[Test build #58265 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58265/consoleFull)** for PR 12961 at commit [`d30ab06`](https://github.com/apache/spark/commit/d30ab068e64586d0cca329887d41e637967ef245).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-217532129
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58013/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-218840364
  
    You probably want to merge in master the PySpark tests have been updated to use `SparkSession`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12961#discussion_r63008949
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -1337,6 +1338,204 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, deviance, pValues) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary is None`.
    +        """
    +        java_lrt_summary = self._call_java("summary")
    +        return GeneralizedLinearRegressionTrainingSummary(java_lrt_summary)
    --- End diff --
    
    Minor: ```java_lrt_summary``` -> ```java_grt_summary```. I guess you refer the code of ```LinearRegressionModel``` in regression.py.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-218023987
  
    **[Test build #58180 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58180/consoleFull)** for PR 12961 at commit [`10a252c`](https://github.com/apache/spark/commit/10a252cb590657cc51e40c08606e17c3e0c8935c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-218838593
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-218838597
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58507/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-218024086
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58180/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12961#discussion_r62731598
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -1337,6 +1338,204 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, deviance, pValues) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary is None`.
    +        """
    +        java_lrt_summary = self._call_java("summary")
    +        return GeneralizedLinearRegressionTrainingSummary(java_lrt_summary)
    +
    +    @property
    +    @since("2.0.0")
    +    def hasSummary(self):
    +        """
    +        Indicates whether a training summary exists for this model
    +        instance.
    +        """
    +        return self._call_java("hasSummary")
    +
    +    @since("2.0.0")
    +    def evaluate(self, dataset):
    +        """
    +        Evaluates the model on a test dataset.
    +
    +        :param dataset:
    +          Test dataset to evaluate model on, where dataset is an
    +          instance of :py:class:`pyspark.sql.DataFrame`
    +        """
    +        if not isinstance(dataset, DataFrame):
    +            raise ValueError("dataset must be a DataFrame but got %s." % type(dataset))
    +        java_glr_summary = self._call_java("evaluate", dataset)
    +        return GeneralizedLinearRegressionSummary(java_glr_summary)
    +
    +
    +class GeneralizedLinearRegressionSummary(JavaWrapper):
    +    """
    +    .. note:: Experimental
    +
    +    Generalized linear regression results evaluated on a dataset.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @property
    +    @since("2.0.0")
    +    def predictions(self):
    +        """
    +        Predictions output by the model's `transform` method.
    +        """
    +        return self._call_java("predictions")
    +
    +    @property
    +    @since("2.0.0")
    +    def predictionCol(self):
    --- End diff --
    
    @sethah you changed this from the Scala ".. gives the prediction value of ..." to use "predicted", which I prefer. Do you mind updating the Scala doc to match?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-217532127
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-218966820
  
    Merged to master and branch-2.0. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-218279351
  
    **[Test build #58265 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58265/consoleFull)** for PR 12961 at commit [`d30ab06`](https://github.com/apache/spark/commit/d30ab068e64586d0cca329887d41e637967ef245).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12961#discussion_r62732446
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -905,6 +906,43 @@ def test_linear_regression_summary(self):
             sameSummary = model.evaluate(df)
             self.assertAlmostEqual(sameSummary.explainedVariance, s.explainedVariance)
     
    +    def test_linear_regression_summary(self):
    --- End diff --
    
    this method is the same name as the one above - should it perhaps be `test_glm_summary`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-218259438
  
    A couple minor comments, pending that LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12961#discussion_r63009814
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -1337,6 +1338,204 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, deviance, pValues) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary is None`.
    +        """
    +        java_lrt_summary = self._call_java("summary")
    +        return GeneralizedLinearRegressionTrainingSummary(java_lrt_summary)
    +
    +    @property
    +    @since("2.0.0")
    +    def hasSummary(self):
    +        """
    +        Indicates whether a training summary exists for this model
    +        instance.
    +        """
    +        return self._call_java("hasSummary")
    +
    +    @since("2.0.0")
    +    def evaluate(self, dataset):
    +        """
    +        Evaluates the model on a test dataset.
    +
    +        :param dataset:
    +          Test dataset to evaluate model on, where dataset is an
    +          instance of :py:class:`pyspark.sql.DataFrame`
    +        """
    +        if not isinstance(dataset, DataFrame):
    +            raise ValueError("dataset must be a DataFrame but got %s." % type(dataset))
    +        java_glr_summary = self._call_java("evaluate", dataset)
    +        return GeneralizedLinearRegressionSummary(java_glr_summary)
    +
    +
    +class GeneralizedLinearRegressionSummary(JavaWrapper):
    +    """
    +    .. note:: Experimental
    +
    +    Generalized linear regression results evaluated on a dataset.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @property
    +    @since("2.0.0")
    +    def predictions(self):
    +        """
    +        Predictions output by the model's `transform` method.
    +        """
    +        return self._call_java("predictions")
    +
    +    @property
    +    @since("2.0.0")
    +    def predictionCol(self):
    +        """
    +        Field in :py:attr:`predictions` which gives the predicted value of each instance.
    +        This is set to a new column name if the original model's `predictionCol` is not set.
    +        """
    +        return self._call_java("predictionCol")
    +
    +    @property
    +    @since("2.0.0")
    +    def rank(self):
    +        """
    +        The numeric rank of the fitted linear model.
    +        """
    +        return self._call_java("rank")
    +
    +    @property
    +    @since("2.0.0")
    +    def degreesOfFreedom(self):
    +        """
    +        Degrees of freedom.
    +        """
    +        return self._call_java("degreesOfFreedom")
    +
    +    @property
    +    @since("2.0.0")
    +    def residualDegreeOfFreedom(self):
    +        """
    +        The residual degrees of freedom.
    +        """
    +        return self._call_java("residualDegreeOfFreedom")
    +
    +    @property
    +    @since("2.0.0")
    +    def residualDegreeOfFreedomNull(self):
    +        """
    +        The residual degrees of freedom for the null model.
    +        """
    +        return self._call_java("residualDegreeOfFreedomNull")
    +
    +    @since("2.0.0")
    +    def residuals(self, residualsType="deviance"):
    +        """
    +        Get the residuals of the fitted model by type.
    --- End diff --
    
    Minor: Add doc for default returned residuals (deviance residuals).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-217522125
  
    **[Test build #58013 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58013/consoleFull)** for PR 12961 at commit [`4c1f4e1`](https://github.com/apache/spark/commit/4c1f4e15ba1ca7ba1c0d46e32a33f5ff89ec6000).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-218842991
  
    **[Test build #58509 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58509/consoleFull)** for PR 12961 at commit [`0f816d2`](https://github.com/apache/spark/commit/0f816d2ffa757609cf3c5c69b5757296a9c7078e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-218290104
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58265/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15181][ML][PYSPARK] Python API for GLR ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12961#issuecomment-218854026
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org