You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by BryanCutler <gi...@git.apache.org> on 2016/03/10 01:09:08 UTC

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

GitHub user BryanCutler opened a pull request:

    https://github.com/apache/spark/pull/11621

    [SPARK-13430][PySpark][ML] Python API for training summaries of linear and logistic regression

    ## What changes were proposed in this pull request?
    
    Adding Python API for training summaries of LogisticRegression and LinearRegression in PySpark ML.
    
    ## How was this patch tested?
    Added unit tests to exercise the api calls for the summary classes.  Also, manually verified values are expected and match those from Scala directly.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/BryanCutler/spark pyspark-ml-summary-SPARK-13430

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11621.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11621
    
----
commit 18047484cf869ae5c6fce32c6b64b9069d709eae
Author: Bryan Cutler <cu...@gmail.com>
Date:   2016-03-02T01:33:20Z

    [SPARK-13430] Added summary classes for logistic and linear regression

commit 57f15cd675cd50a82ef479286d1c027b0c7f700b
Author: Bryan Cutler <cu...@gmail.com>
Date:   2016-03-02T22:23:54Z

    adding test for ml linear regression training summary

commit 4d4bf1a8766834bb49b7014057bac5c0a7f8a03a
Author: Bryan Cutler <cu...@gmail.com>
Date:   2016-03-03T01:32:09Z

    completed test for ml linear regression training summary

commit f9da8e6df323f5c6447d6f9cae771b910023b3ef
Author: Bryan Cutler <cu...@gmail.com>
Date:   2016-03-03T21:53:49Z

    adding test for ml logistic regression training summary

commit ce69f9d5d5748f95c63883c5920e59bbae4e3b79
Author: Bryan Cutler <cu...@gmail.com>
Date:   2016-03-03T22:53:02Z

    changed residual to only check that DataFrame is returned

commit 5d9bf20341896607b6ceb66e2900883110ec9578
Author: Bryan Cutler <cu...@gmail.com>
Date:   2016-03-09T22:58:14Z

    Merge remote-tracking branch 'upstream/master' into pyspark-ml-summary-SPARK-13430
    
    Conflicts:
    	python/pyspark/ml/classification.py
    	python/pyspark/ml/tests.py

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r58762595
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala ---
    @@ -777,10 +777,10 @@ sealed trait LogisticRegressionSummary extends Serializable {
       /** Dataframe outputted by the model's `transform` method. */
       def predictions: DataFrame
     
    -  /** Field in "predictions" which gives the calibrated probability of each instance as a vector. */
    +  /** Field in "predictions" which gives the calibrated probability of each class as a vector. */
       def probabilityCol: String
     
    -  /** Field in "predictions" which gives the true label of each instance. */
    +  /** Field in "predictions" which gives the true label of each instance (if available). */
    --- End diff --
    
    You're right, sounds good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r56698060
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -223,3 +223,20 @@ def _call_java(self, name, *args):
             sc = SparkContext._active_spark_context
             java_args = [_py2java(sc, arg) for arg in args]
             return _java2py(sc, m(*java_args))
    +
    +
    +class JavaCallable(object):
    --- End diff --
    
    Ideally, I would have liked to reuse parts from `JavaWrapper` and `JavaModel` but just making this small class was the most simple solution and least intrusive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r57965953
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -151,6 +151,209 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, mse, r-squared ) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary == None`.
    +        """
    +        java_lrt_summary = self._call_java("summary")
    +        return LinearRegressionTrainingSummary(java_lrt_summary)
    +
    +    @property
    +    @since("2.0.0")
    +    def hasSummary(self):
    +        """
    +        Indicates whether a training summary exists for this model
    +        instance.
    +        """
    +        return self._call_java("hasSummary")
    +
    +    @since("2.0.0")
    +    def evaluate(self, dataset):
    +        """
    +        Evaluates the model on a test dataset.
    +
    +        :param dataset:
    +          Test dataset to evaluate model on.
    +        """
    +        java_lr_summary = self._call_java("evaluate", dataset)
    +        return LinearRegressionSummary(java_lr_summary)
    +
    +
    +class LinearRegressionSummary(JavaCallable):
    +    """
    +    .. note:: Experimental
    +
    +    Linear regression results evaluated on a dataset.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @property
    +    @since("2.0.0")
    +    def predictions(self):
    +        """
    +        Dataframe outputted by the model's `transform` method.
    +        """
    +        return self._call_java("predictions")
    +
    +    @property
    +    @since("2.0.0")
    +    def predictionCol(self):
    +        """
    +        Field in "predictions" which gives the predicted value of
    +        the label at each instance.
    +        """
    +        return self._call_java("predictionCol")
    +
    +    @property
    +    @since("2.0.0")
    +    def labelCol(self):
    +        """
    +        Field in "predictions" which gives the true label of each
    +        instance.
    --- End diff --
    
    add "(if available)"
    Same for Scala doc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r58467586
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -238,6 +238,17 @@ def setParams(self, seed=None):
             return self._set(**kwargs)
     
     
    +class HasThrowableProperty(Params):
    --- End diff --
    
    No, that's fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-194581849
  
    **[Test build #52782 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52782/consoleFull)** for PR 11621 at commit [`5d9bf20`](https://github.com/apache/spark/commit/5d9bf20341896607b6ceb66e2900883110ec9578).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-194611779
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r58466790
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -190,8 +190,30 @@ def _transform(self, dataset):
             return DataFrame(self._java_obj.transform(dataset._jdf), dataset.sql_ctx)
     
     
    +class JavaCallable(object):
    --- End diff --
    
    Good as is for this PR, I'd say.  Yes, please create a JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r58459708
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -190,8 +190,30 @@ def _transform(self, dataset):
             return DataFrame(self._java_obj.transform(dataset._jdf), dataset.sql_ctx)
     
     
    +class JavaCallable(object):
    --- End diff --
    
    Ok, cool.  So is it good enough as-is for this PR, and then I can open another JIRA for the changes mentioned above?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by thunterdb <gi...@git.apache.org>.

Github user thunterdb commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r56587642
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -231,6 +232,210 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, mse, r-squared ) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary == None`.
    +        """
    +        java_blrt_summary = self._call_java("summary")
    +        return BinaryLogisticRegressionTrainingSummary(java_blrt_summary)
    +
    +    @property
    +    @since("2.0.0")
    +    def hasSummary(self):
    +        """
    +        Indicates whether a training summary exists for this model
    +        instance.
    +        """
    +        return self._call_java("hasSummary")
    +
    +    """
    +    TODO: enable once Scala API is made public
    +    def evaluate(self, df):
    +        ""
    +        Evaluates the model on a testset.
    +        @param dataset Test dataset to evaluate model on.
    +        ""
    +        java_blr_summary = self._call_java("evaluate", df)
    +        return BinaryLogisticRegressionSummary(java_blr_summary)
    +    """
    +
    +
    +class LogisticRegressionSummary(JavaCallable):
    +    """
    +    Abstraction for Logistic Regression Results for a given model.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @property
    +    @since("2.0.0")
    +    def predictions(self):
    +        """
    +        Dataframe outputted by the model's `transform` method.
    --- End diff --
    
    nit: technically, `outputted` is not in the dictionary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-194586744
  
    **[Test build #52782 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52782/consoleFull)** for PR 11621 at commit [`5d9bf20`](https://github.com/apache/spark/commit/5d9bf20341896607b6ceb66e2900883110ec9578).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by thunterdb <gi...@git.apache.org>.

Github user thunterdb commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-198108360
  
    @BryanCutler  thanks for the PR, I only have a few small comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r57965918
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -233,6 +234,210 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, mse, r-squared ) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary == None`.
    +        """
    +        java_blrt_summary = self._call_java("summary")
    +        return BinaryLogisticRegressionTrainingSummary(java_blrt_summary)
    +
    +    @property
    +    @since("2.0.0")
    +    def hasSummary(self):
    +        """
    +        Indicates whether a training summary exists for this model
    +        instance.
    +        """
    +        return self._call_java("hasSummary")
    +
    +    @since("2.0.0")
    +    def evaluate(self, dataset):
    +        """
    +        Evaluates the model on a test dataset.
    +
    +        :param dataset:
    +          Test dataset to evaluate model on.
    +        """
    +        java_blr_summary = self._call_java("evaluate", dataset)
    +        return BinaryLogisticRegressionSummary(java_blr_summary)
    +
    +
    +class LogisticRegressionSummary(JavaCallable):
    +    """
    +    Abstraction for Logistic Regression Results for a given model.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @property
    +    @since("2.0.0")
    +    def predictions(self):
    +        """
    +        Dataframe outputted by the model's `transform` method.
    +        """
    +        return self._call_java("predictions")
    +
    +    @property
    +    @since("2.0.0")
    +    def probabilityCol(self):
    +        """
    +        Field in "predictions" which gives the calibrated probability
    +        of each instance as a vector.
    +        """
    +        return self._call_java("probabilityCol")
    +
    +    @property
    +    @since("2.0.0")
    +    def labelCol(self):
    +        """
    +        Field in "predictions" which gives the true label of each
    +        instance.
    --- End diff --
    
    Add: "(if available)"  Same for Scala doc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-205520991
  
    **[Test build #54900 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54900/consoleFull)** for PR 11621 at commit [`4ba3f73`](https://github.com/apache/spark/commit/4ba3f731c58918c5e2eac13338dc834e411b933f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/11621


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-206039464
  
    **[Test build #55055 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55055/consoleFull)** for PR 11621 at commit [`1f030e9`](https://github.com/apache/spark/commit/1f030e91369404535d107a58cfc786f7c9299ab9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by thunterdb <gi...@git.apache.org>.

Github user thunterdb commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r56587676
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -223,3 +223,20 @@ def _call_java(self, name, *args):
             sc = SparkContext._active_spark_context
             java_args = [_py2java(sc, arg) for arg in args]
             return _java2py(sc, m(*java_args))
    +
    +
    +class JavaCallable(object):
    --- End diff --
    
    this looks good to me, but @mengxr will have a more expert opinion on this class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-206515873
  
    LGTM
    Merging with master
    Thanks very much!
    
    Could you please create a JIRA for updating JavaWrapper and JavaCallable?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-206001580
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55028/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-206029929
  
    **[Test build #55038 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55038/consoleFull)** for PR 11621 at commit [`13a10ec`](https://github.com/apache/spark/commit/13a10ecfb24bed6a7708fa1a683855b1416accdd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-206032447
  
    **[Test build #55038 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55038/consoleFull)** for PR 11621 at commit [`13a10ec`](https://github.com/apache/spark/commit/13a10ecfb24bed6a7708fa1a683855b1416accdd).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r57965849
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -28,6 +28,7 @@
     
     
     __all__ = ['LogisticRegression', 'LogisticRegressionModel',
    +           'BinaryLogisticRegressionSummary', 'BinaryLogisticRegressionTrainingSummary',
    --- End diff --
    
    LogisticRegressionSummary and LogisticRegressionTrainingSummary should probably be public too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-205535414
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-194603492
  
    **[Test build #52798 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52798/consoleFull)** for PR 11621 at commit [`e3ac04c`](https://github.com/apache/spark/commit/e3ac04cfcc9e90a649bd7e46346cee110562b2f7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-207004666
  
    @jkbradley, adding those methods would make it a little more usable I think.  Another option might be to just add a method to return the solver used, and then the user could key off that before calling the other methods.  It's not as clear-cut as an explicit method returning a bool, but in the case that 'auto' was set, it would be useful info to have and maybe help to explain why certain methods, like `tValues`, are not available.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r58462597
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -151,6 +151,209 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, mse, r-squared ) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary == None`.
    +        """
    +        java_lrt_summary = self._call_java("summary")
    +        return LinearRegressionTrainingSummary(java_lrt_summary)
    +
    +    @property
    +    @since("2.0.0")
    +    def hasSummary(self):
    +        """
    +        Indicates whether a training summary exists for this model
    +        instance.
    +        """
    +        return self._call_java("hasSummary")
    +
    +    @since("2.0.0")
    +    def evaluate(self, dataset):
    +        """
    +        Evaluates the model on a test dataset.
    +
    +        :param dataset:
    +          Test dataset to evaluate model on.
    +        """
    +        java_lr_summary = self._call_java("evaluate", dataset)
    +        return LinearRegressionSummary(java_lr_summary)
    +
    +
    +class LinearRegressionSummary(JavaCallable):
    +    """
    +    .. note:: Experimental
    +
    +    Linear regression results evaluated on a dataset.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @property
    +    @since("2.0.0")
    +    def predictions(self):
    +        """
    +        Dataframe outputted by the model's `transform` method.
    +        """
    +        return self._call_java("predictions")
    +
    +    @property
    +    @since("2.0.0")
    +    def predictionCol(self):
    +        """
    +        Field in "predictions" which gives the predicted value of
    +        the label at each instance.
    +        """
    +        return self._call_java("predictionCol")
    +
    +    @property
    +    @since("2.0.0")
    +    def labelCol(self):
    +        """
    +        Field in "predictions" which gives the true label of each
    +        instance.
    +        """
    +        return self._call_java("labelCol")
    +
    +    @property
    +    @since("2.0.0")
    +    def explainedVariance(self):
    +        """
    +        Returns the explained variance regression score.
    +        explainedVariance = 1 - variance(y - \hat{y}) / variance(y)
    +        Reference: http://en.wikipedia.org/wiki/Explained_variation
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("explainedVariance")
    +
    +    @property
    +    @since("2.0.0")
    +    def meanAbsoluteError(self):
    +        """
    +        Returns the mean absolute error, which is a risk function
    +        corresponding to the expected value of the absolute error
    +        loss or l1-norm loss.
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("meanAbsoluteError")
    +
    +    @property
    +    @since("2.0.0")
    +    def meanSquaredError(self):
    +        """
    +        Returns the mean squared error, which is a risk function
    +        corresponding to the expected value of the squared error
    +        loss or quadratic loss.
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("meanSquaredError")
    +
    +    @property
    +    @since("2.0.0")
    +    def rootMeanSquaredError(self):
    +        """
    +        Returns the root mean squared error, which is defined as the
    +        square root of the mean squared error.
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("rootMeanSquaredError")
    +
    +    @property
    +    @since("2.0.0")
    +    def r2(self):
    +        """
    +        Returns R^2^, the coefficient of determination.
    +        Reference: http://en.wikipedia.org/wiki/Coefficient_of_determination
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("r2")
    +
    +    @property
    +    @since("2.0.0")
    +    def residuals(self):
    +        """
    +        Residuals (label - predicted value)
    +        """
    +        return self._call_java("residuals")
    +
    +    @property
    +    @since("2.0.0")
    +    def numInstances(self):
    +        """
    +        Number of instances in DataFrame predictions
    +        """
    +        return self._call_java("numInstances")
    +
    +    @property
    +    @since("2.0.0")
    +    def devianceResiduals(self):
    +        """
    +        The weighted residuals, the usual residuals rescaled by the
    +        square root of the instance weights.
    +        """
    +        return self._call_java("devianceResiduals")
    +
    +    @property
    +    @since("2.0.0")
    +    def coefficientStandardErrors(self):
    +        """
    +        Standard error of estimated coefficients and intercept.
    +        """
    +        return self._call_java("coefficientStandardErrors")
    +
    +    @property
    +    @since("2.0.0")
    +    def tValues(self):
    +        """
    +        T-statistic of estimated coefficients and intercept.
    +        """
    +        return self._call_java("tValues")
    +
    +    @property
    +    @since("2.0.0")
    +    def pValues(self):
    +        """
    +        Two-sided p-value of estimated coefficients and intercept.
    +        """
    +        return self._call_java("pValues")
    --- End diff --
    
    My only thought is that when the solver is `auto` (by default), there is a bit more logic to explain when these are valid and maybe it would be better expressed in the user documentation.  What are your thoughts @jkbradley ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r57453380
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -190,8 +190,30 @@ def _transform(self, dataset):
             return DataFrame(self._java_obj.transform(dataset._jdf), dataset.sql_ctx)
     
     
    +class JavaCallable(object):
    --- End diff --
    
    Can we implement this function by updating ```JavaWrapper```?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r58079097
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -151,6 +151,209 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, mse, r-squared ) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary == None`.
    +        """
    +        java_lrt_summary = self._call_java("summary")
    +        return LinearRegressionTrainingSummary(java_lrt_summary)
    +
    +    @property
    +    @since("2.0.0")
    +    def hasSummary(self):
    +        """
    +        Indicates whether a training summary exists for this model
    +        instance.
    +        """
    +        return self._call_java("hasSummary")
    +
    +    @since("2.0.0")
    +    def evaluate(self, dataset):
    +        """
    +        Evaluates the model on a test dataset.
    +
    +        :param dataset:
    +          Test dataset to evaluate model on.
    +        """
    +        java_lr_summary = self._call_java("evaluate", dataset)
    +        return LinearRegressionSummary(java_lr_summary)
    +
    +
    +class LinearRegressionSummary(JavaCallable):
    +    """
    +    .. note:: Experimental
    +
    +    Linear regression results evaluated on a dataset.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @property
    +    @since("2.0.0")
    +    def predictions(self):
    +        """
    +        Dataframe outputted by the model's `transform` method.
    +        """
    +        return self._call_java("predictions")
    +
    +    @property
    +    @since("2.0.0")
    +    def predictionCol(self):
    +        """
    +        Field in "predictions" which gives the predicted value of
    +        the label at each instance.
    +        """
    +        return self._call_java("predictionCol")
    +
    +    @property
    +    @since("2.0.0")
    +    def labelCol(self):
    +        """
    +        Field in "predictions" which gives the true label of each
    +        instance.
    +        """
    +        return self._call_java("labelCol")
    +
    +    @property
    +    @since("2.0.0")
    +    def explainedVariance(self):
    +        """
    +        Returns the explained variance regression score.
    +        explainedVariance = 1 - variance(y - \hat{y}) / variance(y)
    +        Reference: http://en.wikipedia.org/wiki/Explained_variation
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("explainedVariance")
    +
    +    @property
    +    @since("2.0.0")
    +    def meanAbsoluteError(self):
    +        """
    +        Returns the mean absolute error, which is a risk function
    +        corresponding to the expected value of the absolute error
    +        loss or l1-norm loss.
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("meanAbsoluteError")
    +
    +    @property
    +    @since("2.0.0")
    +    def meanSquaredError(self):
    +        """
    +        Returns the mean squared error, which is a risk function
    +        corresponding to the expected value of the squared error
    +        loss or quadratic loss.
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("meanSquaredError")
    +
    +    @property
    +    @since("2.0.0")
    +    def rootMeanSquaredError(self):
    +        """
    +        Returns the root mean squared error, which is defined as the
    +        square root of the mean squared error.
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("rootMeanSquaredError")
    +
    +    @property
    +    @since("2.0.0")
    +    def r2(self):
    +        """
    +        Returns R^2^, the coefficient of determination.
    +        Reference: http://en.wikipedia.org/wiki/Coefficient_of_determination
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("r2")
    +
    +    @property
    +    @since("2.0.0")
    +    def residuals(self):
    +        """
    +        Residuals (label - predicted value)
    +        """
    +        return self._call_java("residuals")
    +
    +    @property
    +    @since("2.0.0")
    +    def numInstances(self):
    +        """
    +        Number of instances in DataFrame predictions
    +        """
    +        return self._call_java("numInstances")
    +
    +    @property
    +    @since("2.0.0")
    +    def devianceResiduals(self):
    +        """
    +        The weighted residuals, the usual residuals rescaled by the
    +        square root of the instance weights.
    +        """
    +        return self._call_java("devianceResiduals")
    +
    +    @property
    +    @since("2.0.0")
    +    def coefficientStandardErrors(self):
    +        """
    +        Standard error of estimated coefficients and intercept.
    +        """
    +        return self._call_java("coefficientStandardErrors")
    +
    +    @property
    +    @since("2.0.0")
    +    def tValues(self):
    +        """
    +        T-statistic of estimated coefficients and intercept.
    +        """
    +        return self._call_java("tValues")
    +
    +    @property
    +    @since("2.0.0")
    +    def pValues(self):
    +        """
    +        Two-sided p-value of estimated coefficients and intercept.
    +        """
    +        return self._call_java("pValues")
    +
    +
    +class LinearRegressionTrainingSummary(LinearRegressionSummary):
    +    """
    +    .. note:: Experimental
    +
    +    Linear regression training results. Currently, the training summary ignores the
    +    training coefficients except for the objective trace.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @property
    +    @since("2.0.0")
    +    def totalIterations(self):
    +        """
    +        Number of training iterations until termination.
    +        """
    --- End diff --
    
    ```totalIterations``` is only valid when training model by ```l-bfgs``` solver.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-206566624
  
    By the way, what do you think about providing methods to test if data are available?
    E.g.,
    ```
    /** Indicates if [[coefficientStandardErrors()]] is available in this summary */
    def hasCoefficientStandardErrors: Boolean
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r56697758
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -231,6 +232,210 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, mse, r-squared ) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary == None`.
    +        """
    +        java_blrt_summary = self._call_java("summary")
    +        return BinaryLogisticRegressionTrainingSummary(java_blrt_summary)
    +
    +    @property
    +    @since("2.0.0")
    +    def hasSummary(self):
    +        """
    +        Indicates whether a training summary exists for this model
    +        instance.
    +        """
    +        return self._call_java("hasSummary")
    +
    +    """
    --- End diff --
    
    Sure I can remove this if that's preferable.  I just didn't want to ignore it without knowing the status of the Scala API.  @mengxr @jkbradley is `evaluate` going to public soon or is it best to not include this at all?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-197482757
  
    **[Test build #53335 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53335/consoleFull)** for PR 11621 at commit [`460881c`](https://github.com/apache/spark/commit/460881cffcb9b6bce35b822e4a9999325352074d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r58460722
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -539,6 +539,10 @@ class LinearRegressionTrainingSummary private[regression] (
      * Linear regression results evaluated on a dataset.
      *
      * @param predictions predictions outputted by the model's `transform` method.
    + * @param predictionCol Field in "predictions" which gives the predicted value of the label at
    + *                      each instance.
    + * @param labelCol Field in "predictions" which gives the true label of each instance
    + *                 (if available).
    --- End diff --
    
    I added "(if available)" to these descriptions, but isn't the `labelCol` for sure going to be defined in any dataset that was trained or evaluated on that will lead to a summary object? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-195123720
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r58629672
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala ---
    @@ -777,10 +777,10 @@ sealed trait LogisticRegressionSummary extends Serializable {
       /** Dataframe outputted by the model's `transform` method. */
       def predictions: DataFrame
     
    -  /** Field in "predictions" which gives the calibrated probability of each instance as a vector. */
    +  /** Field in "predictions" which gives the calibrated probability of each class as a vector. */
       def probabilityCol: String
     
    -  /** Field in "predictions" which gives the true label of each instance. */
    +  /** Field in "predictions" which gives the true label of each instance (if available). */
    --- End diff --
    
    I removed the "(if available)" from all other descriptions but this generic trait because even in the `LinearRegressionSummary` class, for instance, it assumes that a `labelCol` always exists so it can calculate the summary metrics.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by thunterdb <gi...@git.apache.org>.

Github user thunterdb commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r56587628
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -231,6 +232,210 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, mse, r-squared ) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary == None`.
    +        """
    +        java_blrt_summary = self._call_java("summary")
    +        return BinaryLogisticRegressionTrainingSummary(java_blrt_summary)
    +
    +    @property
    +    @since("2.0.0")
    +    def hasSummary(self):
    +        """
    +        Indicates whether a training summary exists for this model
    +        instance.
    +        """
    +        return self._call_java("hasSummary")
    +
    +    """
    --- End diff --
    
    if the code should not be committed, there is no need to include it in comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r57275251
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -223,3 +223,20 @@ def _call_java(self, name, *args):
             sc = SparkContext._active_spark_context
             java_args = [_py2java(sc, arg) for arg in args]
             return _java2py(sc, m(*java_args))
    +
    +
    +class JavaCallable(object):
    --- End diff --
    
    JavaCallable seems reasonable.  Could you modify it so that JavaModel can inherit from it and eliminate the duplicate code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r58467841
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -151,6 +151,209 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, mse, r-squared ) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary == None`.
    +        """
    +        java_lrt_summary = self._call_java("summary")
    +        return LinearRegressionTrainingSummary(java_lrt_summary)
    +
    +    @property
    +    @since("2.0.0")
    +    def hasSummary(self):
    +        """
    +        Indicates whether a training summary exists for this model
    +        instance.
    +        """
    +        return self._call_java("hasSummary")
    +
    +    @since("2.0.0")
    +    def evaluate(self, dataset):
    +        """
    +        Evaluates the model on a test dataset.
    +
    +        :param dataset:
    +          Test dataset to evaluate model on.
    +        """
    +        java_lr_summary = self._call_java("evaluate", dataset)
    +        return LinearRegressionSummary(java_lr_summary)
    +
    +
    +class LinearRegressionSummary(JavaCallable):
    +    """
    +    .. note:: Experimental
    +
    +    Linear regression results evaluated on a dataset.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @property
    +    @since("2.0.0")
    +    def predictions(self):
    +        """
    +        Dataframe outputted by the model's `transform` method.
    +        """
    +        return self._call_java("predictions")
    +
    +    @property
    +    @since("2.0.0")
    +    def predictionCol(self):
    +        """
    +        Field in "predictions" which gives the predicted value of
    +        the label at each instance.
    +        """
    +        return self._call_java("predictionCol")
    +
    +    @property
    +    @since("2.0.0")
    +    def labelCol(self):
    +        """
    +        Field in "predictions" which gives the true label of each
    +        instance.
    +        """
    +        return self._call_java("labelCol")
    +
    +    @property
    +    @since("2.0.0")
    +    def explainedVariance(self):
    +        """
    +        Returns the explained variance regression score.
    +        explainedVariance = 1 - variance(y - \hat{y}) / variance(y)
    +        Reference: http://en.wikipedia.org/wiki/Explained_variation
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("explainedVariance")
    +
    +    @property
    +    @since("2.0.0")
    +    def meanAbsoluteError(self):
    +        """
    +        Returns the mean absolute error, which is a risk function
    +        corresponding to the expected value of the absolute error
    +        loss or l1-norm loss.
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("meanAbsoluteError")
    +
    +    @property
    +    @since("2.0.0")
    +    def meanSquaredError(self):
    +        """
    +        Returns the mean squared error, which is a risk function
    +        corresponding to the expected value of the squared error
    +        loss or quadratic loss.
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("meanSquaredError")
    +
    +    @property
    +    @since("2.0.0")
    +    def rootMeanSquaredError(self):
    +        """
    +        Returns the root mean squared error, which is defined as the
    +        square root of the mean squared error.
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("rootMeanSquaredError")
    +
    +    @property
    +    @since("2.0.0")
    +    def r2(self):
    +        """
    +        Returns R^2^, the coefficient of determination.
    +        Reference: http://en.wikipedia.org/wiki/Coefficient_of_determination
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("r2")
    +
    +    @property
    +    @since("2.0.0")
    +    def residuals(self):
    +        """
    +        Residuals (label - predicted value)
    +        """
    +        return self._call_java("residuals")
    +
    +    @property
    +    @since("2.0.0")
    +    def numInstances(self):
    +        """
    +        Number of instances in DataFrame predictions
    +        """
    +        return self._call_java("numInstances")
    +
    +    @property
    +    @since("2.0.0")
    +    def devianceResiduals(self):
    +        """
    +        The weighted residuals, the usual residuals rescaled by the
    +        square root of the instance weights.
    +        """
    +        return self._call_java("devianceResiduals")
    +
    +    @property
    +    @since("2.0.0")
    +    def coefficientStandardErrors(self):
    +        """
    +        Standard error of estimated coefficients and intercept.
    +        """
    +        return self._call_java("coefficientStandardErrors")
    +
    +    @property
    +    @since("2.0.0")
    +    def tValues(self):
    +        """
    +        T-statistic of estimated coefficients and intercept.
    +        """
    +        return self._call_java("tValues")
    +
    +    @property
    +    @since("2.0.0")
    +    def pValues(self):
    +        """
    +        Two-sided p-value of estimated coefficients and intercept.
    +        """
    +        return self._call_java("pValues")
    --- End diff --
    
    +1 for noting that they may not be available depending on the solver.  How about:
    ```
    This value is only available when using the "normal" solver.
    .. seealso:: :py:attr:`LinearRegression.solver`
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-206071038
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55055/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r57965908
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -233,6 +234,210 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, mse, r-squared ) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary == None`.
    +        """
    +        java_blrt_summary = self._call_java("summary")
    +        return BinaryLogisticRegressionTrainingSummary(java_blrt_summary)
    +
    +    @property
    +    @since("2.0.0")
    +    def hasSummary(self):
    +        """
    +        Indicates whether a training summary exists for this model
    +        instance.
    +        """
    +        return self._call_java("hasSummary")
    +
    +    @since("2.0.0")
    +    def evaluate(self, dataset):
    +        """
    +        Evaluates the model on a test dataset.
    +
    +        :param dataset:
    +          Test dataset to evaluate model on.
    +        """
    +        java_blr_summary = self._call_java("evaluate", dataset)
    --- End diff --
    
    Check that this is a DataFrame


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-198026652
  
    **[Test build #53449 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53449/consoleFull)** for PR 11621 at commit [`d7e17ab`](https://github.com/apache/spark/commit/d7e17ab6ab7219394d08b205e892f383f7ca1641).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r57965893
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -233,6 +234,210 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, mse, r-squared ) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary == None`.
    +        """
    +        java_blrt_summary = self._call_java("summary")
    --- End diff --
    
    Add note that this will need to be updated once multiclass support is added.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-206001577
  
    Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-206028397
  
    jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r58460300
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -151,6 +151,228 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, mse, r-squared ) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary is None`.
    +        """
    +        java_lrt_summary = self._call_java("summary")
    +        return LinearRegressionTrainingSummary(java_lrt_summary)
    +
    +    @property
    +    @since("2.0.0")
    +    def hasSummary(self):
    +        """
    +        Indicates whether a training summary exists for this model
    +        instance.
    +        """
    +        return self._call_java("hasSummary")
    +
    +    @since("2.0.0")
    +    def evaluate(self, dataset):
    +        """
    +        Evaluates the model on a test dataset.
    +
    +        :param dataset:
    +          Test dataset to evaluate model on.
    +        """
    +        java_lr_summary = self._call_java("evaluate", dataset)
    +        return LinearRegressionSummary(java_lr_summary)
    +
    +
    +class LinearRegressionSummary(JavaCallable):
    +    """
    +    .. note:: Experimental
    +
    +    Linear regression results evaluated on a dataset.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @property
    +    @since("2.0.0")
    +    def predictions(self):
    +        """
    +        Dataframe outputted by the model's `transform` method.
    +        """
    +        return self._call_java("predictions")
    +
    +    @property
    +    @since("2.0.0")
    +    def predictionCol(self):
    +        """
    +        Field in "predictions" which gives the predicted value of
    +        the label at each instance.
    +        """
    +        return self._call_java("predictionCol")
    +
    +    @property
    +    @since("2.0.0")
    +    def labelCol(self):
    +        """
    +        Field in "predictions" which gives the true label of each
    +        instance (if available).
    +        """
    +        return self._call_java("labelCol")
    +
    +    @property
    +    @since("2.0.0")
    +    def explainedVariance(self):
    +        """
    +        Returns the explained variance regression score.
    +        explainedVariance = 1 - variance(y - \hat{y}) / variance(y)
    +        Reference: http://en.wikipedia.org/wiki/Explained_variation
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("explainedVariance")
    +
    +    @property
    +    @since("2.0.0")
    +    def meanAbsoluteError(self):
    +        """
    +        Returns the mean absolute error, which is a risk function
    +        corresponding to the expected value of the absolute error
    +        loss or l1-norm loss.
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("meanAbsoluteError")
    +
    +    @property
    +    @since("2.0.0")
    +    def meanSquaredError(self):
    +        """
    +        Returns the mean squared error, which is a risk function
    +        corresponding to the expected value of the squared error
    +        loss or quadratic loss.
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("meanSquaredError")
    +
    +    @property
    +    @since("2.0.0")
    +    def rootMeanSquaredError(self):
    +        """
    +        Returns the root mean squared error, which is defined as the
    +        square root of the mean squared error.
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("rootMeanSquaredError")
    +
    +    @property
    +    @since("2.0.0")
    +    def r2(self):
    +        """
    +        Returns R^2^, the coefficient of determination.
    +        Reference: http://en.wikipedia.org/wiki/Coefficient_of_determination
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("r2")
    +
    +    @property
    +    @since("2.0.0")
    +    def residuals(self):
    +        """
    +        Residuals (label - predicted value)
    +        """
    +        return self._call_java("residuals")
    +
    +    @property
    +    @since("2.0.0")
    +    def numInstances(self):
    +        """
    +        Number of instances in DataFrame predictions
    +        """
    +        return self._call_java("numInstances")
    +
    +    @property
    +    @since("2.0.0")
    +    def devianceResiduals(self):
    +        """
    +        The weighted residuals, the usual residuals rescaled by the
    +        square root of the instance weights.
    +        """
    +        return self._call_java("devianceResiduals")
    +
    +    @property
    +    @since("2.0.0")
    +    def coefficientStandardErrors(self):
    +        """
    +        Standard error of estimated coefficients and intercept.
    +        """
    +        return self._call_java("coefficientStandardErrors")
    +
    +    @property
    +    @since("2.0.0")
    +    def tValues(self):
    +        """
    +        T-statistic of estimated coefficients and intercept.
    +        """
    +        return self._call_java("tValues")
    +
    +    @property
    +    @since("2.0.0")
    +    def pValues(self):
    +        """
    +        Two-sided p-value of estimated coefficients and intercept.
    +        """
    +        return self._call_java("pValues")
    +
    +
    +@inherit_doc
    +class LinearRegressionTrainingSummary(LinearRegressionSummary):
    +    """
    +    .. note:: Experimental
    +
    +    Linear regression training results. Currently, the training summary ignores the
    +    training weights except for the objective trace.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @property
    +    @since("2.0.0")
    +    def featuresCol(self):
    --- End diff --
    
    @jkbradley shouldn't `featuresCol` be part of LinearRegressionSummary instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-195123722
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52875/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r57965857
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -233,6 +234,210 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, mse, r-squared ) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary == None`.
    --- End diff --
    
    trainingSummary is None (for Python)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r57274042
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -231,6 +232,210 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, mse, r-squared ) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary == None`.
    +        """
    +        java_blrt_summary = self._call_java("summary")
    +        return BinaryLogisticRegressionTrainingSummary(java_blrt_summary)
    +
    +    @property
    +    @since("2.0.0")
    +    def hasSummary(self):
    +        """
    +        Indicates whether a training summary exists for this model
    +        instance.
    +        """
    +        return self._call_java("hasSummary")
    +
    +    """
    --- End diff --
    
    I want to make it public.  I'll send a PR now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-198474378
  
    Thanks for checking this out and the comments @thunterdb!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-197492294
  
    **[Test build #53335 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53335/consoleFull)** for PR 11621 at commit [`460881c`](https://github.com/apache/spark/commit/460881cffcb9b6bce35b822e4a9999325352074d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r58459975
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -238,6 +238,17 @@ def setParams(self, seed=None):
             return self._set(**kwargs)
     
     
    +class HasThrowableProperty(Params):
    --- End diff --
    
    I realized I kind of placed this in a weird place in the file previously, so I moved it next to the test case that was using it to be consistent with the other tests.  It's not related to this PR, so would you prefer I move it back?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-195123600
  
    **[Test build #52875 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52875/consoleFull)** for PR 11621 at commit [`8d0f01a`](https://github.com/apache/spark/commit/8d0f01a269adc8dca3383ecb9e4bf4f780806984).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class JavaCallable(object):`
      * `class JavaModelWrapper(object):`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-207016359
  
    So if we included something like the following in an example, it should be clear what's happening
    
    ```Scala
    val trainingSummary = lrModel.summary
    trainingSummary.getSolver() match {
      case "normal" =>
        println(trainingSummary.coefficientStandardErrors.mkString(","))
        ...
      case "l-bfgs" =>
        println(trainingSummary.totalIterations)
        ...
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-206032472
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55038/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-203640655
  
    I had several minor comments, and there were 2 missing methods.  Otherwise, it's looking good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-195120451
  
    **[Test build #52875 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52875/consoleFull)** for PR 11621 at commit [`8d0f01a`](https://github.com/apache/spark/commit/8d0f01a269adc8dca3383ecb9e4bf4f780806984).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-201060245
  
    Made `JavaModel` inherit `JavaCallable`, please take a look when you get a chance @jkbradley and see if this change is ok, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r57965964
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -151,6 +151,209 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, mse, r-squared ) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary == None`.
    +        """
    +        java_lrt_summary = self._call_java("summary")
    +        return LinearRegressionTrainingSummary(java_lrt_summary)
    +
    +    @property
    +    @since("2.0.0")
    +    def hasSummary(self):
    +        """
    +        Indicates whether a training summary exists for this model
    +        instance.
    +        """
    +        return self._call_java("hasSummary")
    +
    +    @since("2.0.0")
    +    def evaluate(self, dataset):
    +        """
    +        Evaluates the model on a test dataset.
    +
    +        :param dataset:
    +          Test dataset to evaluate model on.
    +        """
    +        java_lr_summary = self._call_java("evaluate", dataset)
    +        return LinearRegressionSummary(java_lr_summary)
    +
    +
    +class LinearRegressionSummary(JavaCallable):
    +    """
    +    .. note:: Experimental
    +
    +    Linear regression results evaluated on a dataset.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @property
    +    @since("2.0.0")
    +    def predictions(self):
    +        """
    +        Dataframe outputted by the model's `transform` method.
    +        """
    +        return self._call_java("predictions")
    +
    +    @property
    +    @since("2.0.0")
    +    def predictionCol(self):
    +        """
    +        Field in "predictions" which gives the predicted value of
    +        the label at each instance.
    +        """
    +        return self._call_java("predictionCol")
    +
    +    @property
    +    @since("2.0.0")
    +    def labelCol(self):
    +        """
    +        Field in "predictions" which gives the true label of each
    +        instance.
    +        """
    +        return self._call_java("labelCol")
    +
    +    @property
    +    @since("2.0.0")
    +    def explainedVariance(self):
    +        """
    +        Returns the explained variance regression score.
    +        explainedVariance = 1 - variance(y - \hat{y}) / variance(y)
    +        Reference: http://en.wikipedia.org/wiki/Explained_variation
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("explainedVariance")
    +
    +    @property
    +    @since("2.0.0")
    +    def meanAbsoluteError(self):
    +        """
    +        Returns the mean absolute error, which is a risk function
    +        corresponding to the expected value of the absolute error
    +        loss or l1-norm loss.
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("meanAbsoluteError")
    +
    +    @property
    +    @since("2.0.0")
    +    def meanSquaredError(self):
    +        """
    +        Returns the mean squared error, which is a risk function
    +        corresponding to the expected value of the squared error
    +        loss or quadratic loss.
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("meanSquaredError")
    +
    +    @property
    +    @since("2.0.0")
    +    def rootMeanSquaredError(self):
    +        """
    +        Returns the root mean squared error, which is defined as the
    +        square root of the mean squared error.
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("rootMeanSquaredError")
    +
    +    @property
    +    @since("2.0.0")
    +    def r2(self):
    +        """
    +        Returns R^2^, the coefficient of determination.
    +        Reference: http://en.wikipedia.org/wiki/Coefficient_of_determination
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("r2")
    +
    +    @property
    +    @since("2.0.0")
    +    def residuals(self):
    +        """
    +        Residuals (label - predicted value)
    +        """
    +        return self._call_java("residuals")
    +
    +    @property
    +    @since("2.0.0")
    +    def numInstances(self):
    +        """
    +        Number of instances in DataFrame predictions
    +        """
    +        return self._call_java("numInstances")
    +
    +    @property
    +    @since("2.0.0")
    +    def devianceResiduals(self):
    +        """
    +        The weighted residuals, the usual residuals rescaled by the
    +        square root of the instance weights.
    +        """
    +        return self._call_java("devianceResiduals")
    +
    +    @property
    +    @since("2.0.0")
    +    def coefficientStandardErrors(self):
    +        """
    +        Standard error of estimated coefficients and intercept.
    +        """
    +        return self._call_java("coefficientStandardErrors")
    +
    +    @property
    +    @since("2.0.0")
    +    def tValues(self):
    +        """
    +        T-statistic of estimated coefficients and intercept.
    +        """
    +        return self._call_java("tValues")
    +
    +    @property
    +    @since("2.0.0")
    +    def pValues(self):
    +        """
    +        Two-sided p-value of estimated coefficients and intercept.
    +        """
    +        return self._call_java("pValues")
    +
    +
    +class LinearRegressionTrainingSummary(LinearRegressionSummary):
    --- End diff --
    
    inherit_doc
    
    Also, add featuresCol and objectiveHistory


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r58078950
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -151,6 +151,209 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, mse, r-squared ) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary == None`.
    +        """
    +        java_lrt_summary = self._call_java("summary")
    +        return LinearRegressionTrainingSummary(java_lrt_summary)
    +
    +    @property
    +    @since("2.0.0")
    +    def hasSummary(self):
    +        """
    +        Indicates whether a training summary exists for this model
    +        instance.
    +        """
    +        return self._call_java("hasSummary")
    +
    +    @since("2.0.0")
    +    def evaluate(self, dataset):
    +        """
    +        Evaluates the model on a test dataset.
    +
    +        :param dataset:
    +          Test dataset to evaluate model on.
    +        """
    +        java_lr_summary = self._call_java("evaluate", dataset)
    +        return LinearRegressionSummary(java_lr_summary)
    +
    +
    +class LinearRegressionSummary(JavaCallable):
    +    """
    +    .. note:: Experimental
    +
    +    Linear regression results evaluated on a dataset.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @property
    +    @since("2.0.0")
    +    def predictions(self):
    +        """
    +        Dataframe outputted by the model's `transform` method.
    +        """
    +        return self._call_java("predictions")
    +
    +    @property
    +    @since("2.0.0")
    +    def predictionCol(self):
    +        """
    +        Field in "predictions" which gives the predicted value of
    +        the label at each instance.
    +        """
    +        return self._call_java("predictionCol")
    +
    +    @property
    +    @since("2.0.0")
    +    def labelCol(self):
    +        """
    +        Field in "predictions" which gives the true label of each
    +        instance.
    +        """
    +        return self._call_java("labelCol")
    +
    +    @property
    +    @since("2.0.0")
    +    def explainedVariance(self):
    +        """
    +        Returns the explained variance regression score.
    +        explainedVariance = 1 - variance(y - \hat{y}) / variance(y)
    +        Reference: http://en.wikipedia.org/wiki/Explained_variation
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("explainedVariance")
    +
    +    @property
    +    @since("2.0.0")
    +    def meanAbsoluteError(self):
    +        """
    +        Returns the mean absolute error, which is a risk function
    +        corresponding to the expected value of the absolute error
    +        loss or l1-norm loss.
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("meanAbsoluteError")
    +
    +    @property
    +    @since("2.0.0")
    +    def meanSquaredError(self):
    +        """
    +        Returns the mean squared error, which is a risk function
    +        corresponding to the expected value of the squared error
    +        loss or quadratic loss.
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("meanSquaredError")
    +
    +    @property
    +    @since("2.0.0")
    +    def rootMeanSquaredError(self):
    +        """
    +        Returns the root mean squared error, which is defined as the
    +        square root of the mean squared error.
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("rootMeanSquaredError")
    +
    +    @property
    +    @since("2.0.0")
    +    def r2(self):
    +        """
    +        Returns R^2^, the coefficient of determination.
    +        Reference: http://en.wikipedia.org/wiki/Coefficient_of_determination
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("r2")
    +
    +    @property
    +    @since("2.0.0")
    +    def residuals(self):
    +        """
    +        Residuals (label - predicted value)
    +        """
    +        return self._call_java("residuals")
    +
    +    @property
    +    @since("2.0.0")
    +    def numInstances(self):
    +        """
    +        Number of instances in DataFrame predictions
    +        """
    +        return self._call_java("numInstances")
    +
    +    @property
    +    @since("2.0.0")
    +    def devianceResiduals(self):
    +        """
    +        The weighted residuals, the usual residuals rescaled by the
    +        square root of the instance weights.
    +        """
    +        return self._call_java("devianceResiduals")
    +
    +    @property
    +    @since("2.0.0")
    +    def coefficientStandardErrors(self):
    +        """
    +        Standard error of estimated coefficients and intercept.
    +        """
    +        return self._call_java("coefficientStandardErrors")
    +
    +    @property
    +    @since("2.0.0")
    +    def tValues(self):
    +        """
    +        T-statistic of estimated coefficients and intercept.
    +        """
    +        return self._call_java("tValues")
    +
    +    @property
    +    @since("2.0.0")
    +    def pValues(self):
    +        """
    +        Two-sided p-value of estimated coefficients and intercept.
    +        """
    +        return self._call_java("pValues")
    --- End diff --
    
    Should we clarify that ```coefficientStandardErrors, tValues and pValues``` are only valid when train ```LinearRegressionModel``` by ```normal``` solver? We may also need to update the Scala doc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r57965947
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -151,6 +151,209 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, mse, r-squared ) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary == None`.
    --- End diff --
    
    "==" --> "is"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-198026901
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-206070844
  
    **[Test build #55055 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55055/consoleFull)** for PR 11621 at commit [`1f030e9`](https://github.com/apache/spark/commit/1f030e91369404535d107a58cfc786f7c9299ab9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r57958007
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -190,8 +190,30 @@ def _transform(self, dataset):
             return DataFrame(self._java_obj.transform(dataset._jdf), dataset.sql_ctx)
     
     
    +class JavaCallable(object):
    --- End diff --
    
    I agree with having JavaWrapper inherit from JavaCallable.  Renaming sounds reasonable, but doing it in another PR sounds like a good idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-206032471
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-197492522
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-194586858
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52782/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-205535269
  
    **[Test build #54900 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54900/consoleFull)** for PR 11621 at commit [`4ba3f73`](https://github.com/apache/spark/commit/4ba3f731c58918c5e2eac13338dc834e411b933f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-201065047
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54108/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by thunterdb <gi...@git.apache.org>.

Github user thunterdb commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r56587656
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -151,6 +151,209 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, mse, r-squared ) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary == None`.
    +        """
    +        java_lrt_summary = self._call_java("summary")
    +        return LinearRegressionTrainingSummary(java_lrt_summary)
    +
    +    @property
    +    @since("2.0.0")
    +    def hasSummary(self):
    +        """
    +        Indicates whether a training summary exists for this model
    +        instance.
    +        """
    +        return self._call_java("hasSummary")
    +
    +    """
    --- End diff --
    
    same thing here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r57976400
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -151,6 +151,209 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, mse, r-squared ) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary == None`.
    +        """
    +        java_lrt_summary = self._call_java("summary")
    +        return LinearRegressionTrainingSummary(java_lrt_summary)
    +
    +    @property
    +    @since("2.0.0")
    +    def hasSummary(self):
    +        """
    +        Indicates whether a training summary exists for this model
    +        instance.
    +        """
    +        return self._call_java("hasSummary")
    +
    +    @since("2.0.0")
    +    def evaluate(self, dataset):
    +        """
    +        Evaluates the model on a test dataset.
    +
    +        :param dataset:
    +          Test dataset to evaluate model on.
    +        """
    +        java_lr_summary = self._call_java("evaluate", dataset)
    +        return LinearRegressionSummary(java_lr_summary)
    +
    +
    +class LinearRegressionSummary(JavaCallable):
    +    """
    +    .. note:: Experimental
    +
    +    Linear regression results evaluated on a dataset.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @property
    +    @since("2.0.0")
    +    def predictions(self):
    +        """
    +        Dataframe outputted by the model's `transform` method.
    +        """
    +        return self._call_java("predictions")
    +
    +    @property
    +    @since("2.0.0")
    +    def predictionCol(self):
    +        """
    +        Field in "predictions" which gives the predicted value of
    +        the label at each instance.
    +        """
    +        return self._call_java("predictionCol")
    +
    +    @property
    +    @since("2.0.0")
    +    def labelCol(self):
    +        """
    +        Field in "predictions" which gives the true label of each
    +        instance.
    +        """
    +        return self._call_java("labelCol")
    +
    +    @property
    +    @since("2.0.0")
    +    def explainedVariance(self):
    +        """
    +        Returns the explained variance regression score.
    +        explainedVariance = 1 - variance(y - \hat{y}) / variance(y)
    +        Reference: http://en.wikipedia.org/wiki/Explained_variation
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("explainedVariance")
    +
    +    @property
    +    @since("2.0.0")
    +    def meanAbsoluteError(self):
    +        """
    +        Returns the mean absolute error, which is a risk function
    +        corresponding to the expected value of the absolute error
    +        loss or l1-norm loss.
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("meanAbsoluteError")
    +
    +    @property
    +    @since("2.0.0")
    +    def meanSquaredError(self):
    +        """
    +        Returns the mean squared error, which is a risk function
    +        corresponding to the expected value of the squared error
    +        loss or quadratic loss.
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("meanSquaredError")
    +
    +    @property
    +    @since("2.0.0")
    +    def rootMeanSquaredError(self):
    +        """
    +        Returns the root mean squared error, which is defined as the
    +        square root of the mean squared error.
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("rootMeanSquaredError")
    +
    +    @property
    +    @since("2.0.0")
    +    def r2(self):
    +        """
    +        Returns R^2^, the coefficient of determination.
    +        Reference: http://en.wikipedia.org/wiki/Coefficient_of_determination
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("r2")
    +
    +    @property
    +    @since("2.0.0")
    +    def residuals(self):
    +        """
    +        Residuals (label - predicted value)
    +        """
    +        return self._call_java("residuals")
    +
    +    @property
    +    @since("2.0.0")
    +    def numInstances(self):
    +        """
    +        Number of instances in DataFrame predictions
    +        """
    +        return self._call_java("numInstances")
    +
    +    @property
    +    @since("2.0.0")
    +    def devianceResiduals(self):
    +        """
    +        The weighted residuals, the usual residuals rescaled by the
    +        square root of the instance weights.
    +        """
    +        return self._call_java("devianceResiduals")
    +
    +    @property
    +    @since("2.0.0")
    +    def coefficientStandardErrors(self):
    +        """
    +        Standard error of estimated coefficients and intercept.
    +        """
    +        return self._call_java("coefficientStandardErrors")
    +
    +    @property
    +    @since("2.0.0")
    +    def tValues(self):
    +        """
    +        T-statistic of estimated coefficients and intercept.
    +        """
    +        return self._call_java("tValues")
    +
    +    @property
    +    @since("2.0.0")
    +    def pValues(self):
    +        """
    +        Two-sided p-value of estimated coefficients and intercept.
    +        """
    +        return self._call_java("pValues")
    +
    +
    +class LinearRegressionTrainingSummary(LinearRegressionSummary):
    +    """
    +    .. note:: Experimental
    +
    +    Linear regression training results. Currently, the training summary ignores the
    +    training coefficients except for the objective trace.
    --- End diff --
    
    "coefficients" --> "weights"  This is a bug from the Scala doc; can you fix it there too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-205535415
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54900/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-207048048
  
    Good point, let's expose the solver for now. I made a JIRA: https://issues.apache.org/jira/browse/SPARK-14461


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-198026904
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53449/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r56384119
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -34,10 +34,12 @@ class JavaWrapper(Params):
     
         __metaclass__ = ABCMeta
     
    -    #: The wrapped Java companion object. Subclasses should initialize
    -    #: it properly. The param values in the Java object should be
    -    #: synced with the Python wrapper in fit/transform/evaluate/copy.
    -    _java_obj = None
    --- End diff --
    
    Nevermind, I opened a separate JIRA [SPARK-13937](https://issues.apache.org/jira/browse/SPARK-13937) for this since it is not really related to the issue in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-206071034
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r58467928
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -151,6 +151,228 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, mse, r-squared ) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary is None`.
    +        """
    +        java_lrt_summary = self._call_java("summary")
    +        return LinearRegressionTrainingSummary(java_lrt_summary)
    +
    +    @property
    +    @since("2.0.0")
    +    def hasSummary(self):
    +        """
    +        Indicates whether a training summary exists for this model
    +        instance.
    +        """
    +        return self._call_java("hasSummary")
    +
    +    @since("2.0.0")
    +    def evaluate(self, dataset):
    +        """
    +        Evaluates the model on a test dataset.
    +
    +        :param dataset:
    +          Test dataset to evaluate model on.
    +        """
    +        java_lr_summary = self._call_java("evaluate", dataset)
    +        return LinearRegressionSummary(java_lr_summary)
    +
    +
    +class LinearRegressionSummary(JavaCallable):
    +    """
    +    .. note:: Experimental
    +
    +    Linear regression results evaluated on a dataset.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @property
    +    @since("2.0.0")
    +    def predictions(self):
    +        """
    +        Dataframe outputted by the model's `transform` method.
    +        """
    +        return self._call_java("predictions")
    +
    +    @property
    +    @since("2.0.0")
    +    def predictionCol(self):
    +        """
    +        Field in "predictions" which gives the predicted value of
    +        the label at each instance.
    +        """
    +        return self._call_java("predictionCol")
    +
    +    @property
    +    @since("2.0.0")
    +    def labelCol(self):
    +        """
    +        Field in "predictions" which gives the true label of each
    +        instance (if available).
    +        """
    +        return self._call_java("labelCol")
    +
    +    @property
    +    @since("2.0.0")
    +    def explainedVariance(self):
    +        """
    +        Returns the explained variance regression score.
    +        explainedVariance = 1 - variance(y - \hat{y}) / variance(y)
    +        Reference: http://en.wikipedia.org/wiki/Explained_variation
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("explainedVariance")
    +
    +    @property
    +    @since("2.0.0")
    +    def meanAbsoluteError(self):
    +        """
    +        Returns the mean absolute error, which is a risk function
    +        corresponding to the expected value of the absolute error
    +        loss or l1-norm loss.
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("meanAbsoluteError")
    +
    +    @property
    +    @since("2.0.0")
    +    def meanSquaredError(self):
    +        """
    +        Returns the mean squared error, which is a risk function
    +        corresponding to the expected value of the squared error
    +        loss or quadratic loss.
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("meanSquaredError")
    +
    +    @property
    +    @since("2.0.0")
    +    def rootMeanSquaredError(self):
    +        """
    +        Returns the root mean squared error, which is defined as the
    +        square root of the mean squared error.
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("rootMeanSquaredError")
    +
    +    @property
    +    @since("2.0.0")
    +    def r2(self):
    +        """
    +        Returns R^2^, the coefficient of determination.
    +        Reference: http://en.wikipedia.org/wiki/Coefficient_of_determination
    +
    +        Note: This ignores instance weights (setting all to 1.0) from
    +        `LinearRegression.weightCol`. This will change in later Spark
    +        versions.
    +        """
    +        return self._call_java("r2")
    +
    +    @property
    +    @since("2.0.0")
    +    def residuals(self):
    +        """
    +        Residuals (label - predicted value)
    +        """
    +        return self._call_java("residuals")
    +
    +    @property
    +    @since("2.0.0")
    +    def numInstances(self):
    +        """
    +        Number of instances in DataFrame predictions
    +        """
    +        return self._call_java("numInstances")
    +
    +    @property
    +    @since("2.0.0")
    +    def devianceResiduals(self):
    +        """
    +        The weighted residuals, the usual residuals rescaled by the
    +        square root of the instance weights.
    +        """
    +        return self._call_java("devianceResiduals")
    +
    +    @property
    +    @since("2.0.0")
    +    def coefficientStandardErrors(self):
    +        """
    +        Standard error of estimated coefficients and intercept.
    +        """
    +        return self._call_java("coefficientStandardErrors")
    +
    +    @property
    +    @since("2.0.0")
    +    def tValues(self):
    +        """
    +        T-statistic of estimated coefficients and intercept.
    +        """
    +        return self._call_java("tValues")
    +
    +    @property
    +    @since("2.0.0")
    +    def pValues(self):
    +        """
    +        Two-sided p-value of estimated coefficients and intercept.
    +        """
    +        return self._call_java("pValues")
    +
    +
    +@inherit_doc
    +class LinearRegressionTrainingSummary(LinearRegressionSummary):
    +    """
    +    .. note:: Experimental
    +
    +    Linear regression training results. Currently, the training summary ignores the
    +    training weights except for the objective trace.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @property
    +    @since("2.0.0")
    +    def featuresCol(self):
    --- End diff --
    
    Yes, it should.  Could you please update it in Python and Scala?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-206451858
  
    @jkbradley  I incorporated the latest feedback, please take another look when you get a chance, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-201064969
  
    **[Test build #54108 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54108/consoleFull)** for PR 11621 at commit [`3571838`](https://github.com/apache/spark/commit/3571838b60fb1f0cf869e89e127b6ad7b95bd3b3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-205547478
  
    @BryanCutler Thanks for the updates!  They look good.  I just commented on some of the discussions above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r57965913
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -233,6 +234,210 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, mse, r-squared ) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary == None`.
    +        """
    +        java_blrt_summary = self._call_java("summary")
    +        return BinaryLogisticRegressionTrainingSummary(java_blrt_summary)
    +
    +    @property
    +    @since("2.0.0")
    +    def hasSummary(self):
    +        """
    +        Indicates whether a training summary exists for this model
    +        instance.
    +        """
    +        return self._call_java("hasSummary")
    +
    +    @since("2.0.0")
    +    def evaluate(self, dataset):
    +        """
    +        Evaluates the model on a test dataset.
    +
    +        :param dataset:
    +          Test dataset to evaluate model on.
    +        """
    +        java_blr_summary = self._call_java("evaluate", dataset)
    +        return BinaryLogisticRegressionSummary(java_blr_summary)
    +
    +
    +class LogisticRegressionSummary(JavaCallable):
    +    """
    +    Abstraction for Logistic Regression Results for a given model.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @property
    +    @since("2.0.0")
    +    def predictions(self):
    +        """
    +        Dataframe outputted by the model's `transform` method.
    +        """
    +        return self._call_java("predictions")
    +
    +    @property
    +    @since("2.0.0")
    +    def probabilityCol(self):
    +        """
    +        Field in "predictions" which gives the calibrated probability
    +        of each instance as a vector.
    --- End diff --
    
    "instance" --> "class"  This looks like an error from the Scala doc; could you please fix it there too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r57661783
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -190,8 +190,30 @@ def _transform(self, dataset):
             return DataFrame(self._java_obj.transform(dataset._jdf), dataset.sql_ctx)
     
     
    +class JavaCallable(object):
    --- End diff --
    
    `JavaWrapper` does define a java object wrapper variable, but also does more since it is derived from `Params`.  Since I need to wrap a Java Summary object (which is just a plain Java object), it doesn't really make sense to use it unless we want to change that.
    
    Actually, I think it might make more sense to have `JavaWrapper` inherit from `JavaCallable` and only define `_java_obj` there.  Then rename the classes a little like this to better reflect what they are for:
    
    `JavaCallable(object)` -> `JavaWrapper(object)`
    and
    `JavaWrapper(Params)` -> `JavaParamWrapper(JavaWrapper, Params)`
    
    That might be better done in another JIRA though, because of the name change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-201065045
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r57965926
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -233,6 +234,210 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, mse, r-squared ) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary == None`.
    +        """
    +        java_blrt_summary = self._call_java("summary")
    +        return BinaryLogisticRegressionTrainingSummary(java_blrt_summary)
    +
    +    @property
    +    @since("2.0.0")
    +    def hasSummary(self):
    +        """
    +        Indicates whether a training summary exists for this model
    +        instance.
    +        """
    +        return self._call_java("hasSummary")
    +
    +    @since("2.0.0")
    +    def evaluate(self, dataset):
    +        """
    +        Evaluates the model on a test dataset.
    +
    +        :param dataset:
    +          Test dataset to evaluate model on.
    +        """
    +        java_blr_summary = self._call_java("evaluate", dataset)
    +        return BinaryLogisticRegressionSummary(java_blr_summary)
    +
    +
    +class LogisticRegressionSummary(JavaCallable):
    +    """
    +    Abstraction for Logistic Regression Results for a given model.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @property
    +    @since("2.0.0")
    +    def predictions(self):
    +        """
    +        Dataframe outputted by the model's `transform` method.
    +        """
    +        return self._call_java("predictions")
    +
    +    @property
    +    @since("2.0.0")
    +    def probabilityCol(self):
    +        """
    +        Field in "predictions" which gives the calibrated probability
    +        of each instance as a vector.
    +        """
    +        return self._call_java("probabilityCol")
    +
    +    @property
    +    @since("2.0.0")
    +    def labelCol(self):
    +        """
    +        Field in "predictions" which gives the true label of each
    +        instance.
    +        """
    +        return self._call_java("labelCol")
    +
    +    @property
    +    @since("2.0.0")
    +    def featuresCol(self):
    +        """
    +        Field in "predictions" which gives the features of each instance
    +        as a vector.
    +        """
    +        return self._call_java("featuresCol")
    +
    +
    +class LogisticRegressionTrainingSummary(LogisticRegressionSummary):
    --- End diff --
    
    ```@inherit_doc```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r57362618
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -223,3 +223,20 @@ def _call_java(self, name, *args):
             sc = SparkContext._active_spark_context
             java_args = [_py2java(sc, arg) for arg in args]
             return _java2py(sc, m(*java_args))
    +
    +
    +class JavaCallable(object):
    --- End diff --
    
    I tried that before and it got a little messy with the definition of `_java_obj`, but I think I have an idea so let me give it another shot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-194611784
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52798/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r58467127
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -539,6 +539,10 @@ class LinearRegressionTrainingSummary private[regression] (
      * Linear regression results evaluated on a dataset.
      *
      * @param predictions predictions outputted by the model's `transform` method.
    + * @param predictionCol Field in "predictions" which gives the predicted value of the label at
    + *                      each instance.
    + * @param labelCol Field in "predictions" which gives the true label of each instance
    + *                 (if available).
    --- End diff --
    
    Hmm, good point; "if available" only makes sense for the generic summary classes, not the *TrainingSummary classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r55621515
  
    --- Diff: python/pyspark/mllib/common.py ---
    @@ -130,20 +130,39 @@ def callMLlibFunc(name, *args):
         return callJavaFunc(sc, api, *args)
     
     
    -class JavaModelWrapper(object):
    +class JavaCallable(object):
         """
    -    Wrapper for the model in JVM
    +    Wrapper for an object in JVM to make Java calls
         """
    -    def __init__(self, java_model):
    -        self._sc = SparkContext.getOrCreate()
    -        self._java_model = java_model
    +    def __init__(self, sc, java_obj):
    +        self._sc = sc
    +        self._java_obj = java_obj
     
         def __del__(self):
    -        self._sc._gateway.detach(self._java_model)
    +        self._sc._gateway.detach(self._java_obj)
    +
    +    @classmethod
    +    def fromActiveSparkContext(cls, java_obj):
    +        """Create from a currently active context"""
    +        sc = SparkContext._active_spark_context
    +        return cls(sc, java_obj)
     
         def call(self, name, *a):
    --- End diff --
    
    this ends up in the api docs because ML has the `:inherited-members:` flag on [here](https://raw.githubusercontent.com/apache/spark/master/python/docs/pyspark.ml.rst)
    
    Is there any way to avoid this besides making these private?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-206516989
  
    Cool, thanks!
    > Could you please create a JIRA for updating JavaWrapper and JavaCallable?
    
    Will do


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-194586855
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r55774471
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -34,10 +34,12 @@ class JavaWrapper(Params):
     
         __metaclass__ = ABCMeta
     
    -    #: The wrapped Java companion object. Subclasses should initialize
    -    #: it properly. The param values in the Java object should be
    -    #: synced with the Python wrapper in fit/transform/evaluate/copy.
    -    _java_obj = None
    --- End diff --
    
    @mengxr @jkbradley it seems like this should not be a static member, right?  From what I could tell, it's not even being used since the assignment is always to `self._java_obj`.  So I made a change to make it a member instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-194611410
  
    **[Test build #52798 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52798/consoleFull)** for PR 11621 at commit [`e3ac04c`](https://github.com/apache/spark/commit/e3ac04cfcc9e90a649bd7e46346cee110562b2f7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-197492525
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53335/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r57965972
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -238,6 +238,17 @@ def setParams(self, seed=None):
             return self._set(**kwargs)
     
     
    +class HasThrowableProperty(Params):
    --- End diff --
    
    Why move this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-198019144
  
    **[Test build #53449 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53449/consoleFull)** for PR 11621 at commit [`d7e17ab`](https://github.com/apache/spark/commit/d7e17ab6ab7219394d08b205e892f383f7ca1641).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11621#issuecomment-201059864
  
    **[Test build #54108 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54108/consoleFull)** for PR 11621 at commit [`3571838`](https://github.com/apache/spark/commit/3571838b60fb1f0cf869e89e127b6ad7b95bd3b3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11621#discussion_r57965932
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -233,6 +234,210 @@ def intercept(self):
             """
             return self._call_java("intercept")
     
    +    @property
    +    @since("2.0.0")
    +    def summary(self):
    +        """
    +        Gets summary (e.g. residuals, mse, r-squared ) of model on
    +        training set. An exception is thrown if
    +        `trainingSummary == None`.
    +        """
    +        java_blrt_summary = self._call_java("summary")
    +        return BinaryLogisticRegressionTrainingSummary(java_blrt_summary)
    +
    +    @property
    +    @since("2.0.0")
    +    def hasSummary(self):
    +        """
    +        Indicates whether a training summary exists for this model
    +        instance.
    +        """
    +        return self._call_java("hasSummary")
    +
    +    @since("2.0.0")
    +    def evaluate(self, dataset):
    +        """
    +        Evaluates the model on a test dataset.
    +
    +        :param dataset:
    +          Test dataset to evaluate model on.
    +        """
    +        java_blr_summary = self._call_java("evaluate", dataset)
    +        return BinaryLogisticRegressionSummary(java_blr_summary)
    +
    +
    +class LogisticRegressionSummary(JavaCallable):
    +    """
    +    Abstraction for Logistic Regression Results for a given model.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @property
    +    @since("2.0.0")
    +    def predictions(self):
    +        """
    +        Dataframe outputted by the model's `transform` method.
    +        """
    +        return self._call_java("predictions")
    +
    +    @property
    +    @since("2.0.0")
    +    def probabilityCol(self):
    +        """
    +        Field in "predictions" which gives the calibrated probability
    +        of each instance as a vector.
    +        """
    +        return self._call_java("probabilityCol")
    +
    +    @property
    +    @since("2.0.0")
    +    def labelCol(self):
    +        """
    +        Field in "predictions" which gives the true label of each
    +        instance.
    +        """
    +        return self._call_java("labelCol")
    +
    +    @property
    +    @since("2.0.0")
    +    def featuresCol(self):
    +        """
    +        Field in "predictions" which gives the features of each instance
    +        as a vector.
    +        """
    +        return self._call_java("featuresCol")
    +
    +
    +class LogisticRegressionTrainingSummary(LogisticRegressionSummary):
    +    """
    +    Abstraction for multinomial Logistic Regression Training results.
    +    Currently, the training summary ignores the training weights except
    +    for the objective trace.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @property
    +    @since("2.0.0")
    +    def objectiveHistory(self):
    +        """
    +        Objective function (scaled loss + regularization) at each iteration.
    +        """
    +        return self._call_java("objectiveHistory")
    +
    +    @property
    +    @since("2.0.0")
    +    def totalIterations(self):
    +        """
    +        Number of training iterations until termination.
    +        """
    +        return self._call_java("totalIterations")
    +
    +
    +class BinaryLogisticRegressionSummary(LogisticRegressionSummary):
    --- End diff --
    
    ```@inherit_doc```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org