You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by yinxusen <gi...@git.apache.org> on 2016/03/29 02:42:51 UTC

[GitHub] spark pull request: Spark 13786

GitHub user yinxusen opened a pull request:

    https://github.com/apache/spark/pull/12020

    Spark 13786

    ## What changes were proposed in this pull request?
    
    https://issues.apache.org/jira/browse/SPARK-13786
    
    Add save/load for Python CrossValidator/Model and TrainValidationSplit/Model.
    
    ## How was this patch tested?
    
    Test with Python doctest.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yinxusen/spark SPARK-13786

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12020.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12020
    
----
commit 40b8108e015340ed696d52ea5cd3c7a6dc9a272d
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-27T02:52:26Z

    add save/load for cv

commit 0c26dc06693d4424de819e01f6847b94b69c9087
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-27T05:02:46Z

    add save/load for cvModel

commit 73fe5641168fe071ba6ff89c090aadb533ed3fac
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-27T05:46:45Z

    move validatorParams out

commit 1d3eefd4c8489700ee127fdb876cb8d4850dd37d
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-27T06:14:26Z

    add for tvs

commit faac9b4d94e9a7ae84784dceaa268b146ac35ea6
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-28T23:04:02Z

    Merge branch 'master' into SPARK-13786

commit 24605907ec609df13c1eb9edc9187a82e9e3fb80
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-28T23:32:20Z

    fix TrainValidationSplitModel

commit 8056bbeea0061bdd23e5901da3087c5667153682
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-29T00:40:47Z

    fix nits

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12020#issuecomment-206473867
  
    **[Test build #55125 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55125/consoleFull)** for PR 12020 at commit [`65831f5`](https://github.com/apache/spark/commit/65831f55e3d2e93c2420e5cacfbd3e8cd6896de1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12020#issuecomment-205149215
  
    **[Test build #2737 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2737/consoleFull)** for PR 12020 at commit [`faba3e5`](https://github.com/apache/spark/commit/faba3e503d0d1d8d1f8a8232b034f7c773282071).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12020#discussion_r58596570
  
    --- Diff: python/pyspark/ml/tuning.py ---
    @@ -91,7 +94,90 @@ def build(self):
             return [dict(zip(keys, prod)) for prod in itertools.product(*grid_values)]
     
     
    -class CrossValidator(Estimator, HasSeed):
    +class ValidatorParams(Params):
    +    """
    +    Common params for TrainValidationSplit and CrossValidator.
    +    """
    +
    +    estimator = Param(Params._dummy(), "estimator", "estimator to be cross-validated")
    +    estimatorParamMaps = Param(Params._dummy(), "estimatorParamMaps", "estimator param maps")
    +    evaluator = Param(
    +        Params._dummy(), "evaluator",
    +        "evaluator used to select hyper-parameters that maximize the validator metric")
    +
    +    def __init__(self):
    --- End diff --
    
    no need to include this (default implementation)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12020#issuecomment-202647635
  
    **[Test build #54391 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54391/consoleFull)** for PR 12020 at commit [`8056bbe`](https://github.com/apache/spark/commit/8056bbeea0061bdd23e5901da3087c5667153682).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on the pull request:

    https://github.com/apache/spark/pull/12020#issuecomment-205981531
  
    Thanks @holdenk and @jkbradley, I'll fix them soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12020#issuecomment-203700057
  
    **[Test build #54573 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54573/consoleFull)** for PR 12020 at commit [`faba3e5`](https://github.com/apache/spark/commit/faba3e503d0d1d8d1f8a8232b034f7c773282071).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/12020#issuecomment-205963330
  
    Done with my review!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12020#discussion_r58634140
  
    --- Diff: python/pyspark/ml/tuning.py ---
    @@ -112,15 +198,24 @@ class CrossValidator(Estimator, HasSeed):
         >>> cvModel = cv.fit(dataset)
         >>> evaluator.evaluate(cvModel.transform(dataset))
         0.8333...
    +    >>> cvPath = temp_path + "/cv"
    +    >>> cv.save(cvPath)
    +    >>> loadedCV = CrossValidator.load(cvPath)
    +    >>> loadedCV.getEstimator().uid == cv.getEstimator().uid
    --- End diff --
    
    Ah.. Sorry for forgetting there are unit tests for CrossValidator and TrainValidateSplit in tests.py.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12020#issuecomment-202656596
  
    **[Test build #54391 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54391/consoleFull)** for PR 12020 at commit [`8056bbe`](https://github.com/apache/spark/commit/8056bbeea0061bdd23e5901da3087c5667153682).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12020#discussion_r58602540
  
    --- Diff: python/pyspark/ml/tuning.py ---
    @@ -480,19 +653,87 @@ def copy(self, extra=None):
                 extra = dict()
             return TrainValidationSplitModel(self.bestModel.copy(extra))
     
    +    @since("2.0.0")
    +    def write(self):
    +        """Returns an JavaMLWriter instance for this ML instance."""
    +        return JavaMLWriter(self)
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Save this ML instance to the given path, a shortcut of `write().save(path)`."""
    +        self.write().save(path)
    +
    +    @classmethod
    +    @since("2.0.0")
    +    def read(cls):
    +        """Returns an MLReader instance for this class."""
    +        return JavaMLReader(cls)
    +
    +    @classmethod
    +    def _from_java(cls, java_stage):
    +        """
    +        Given a Java TrainValidationSplitModel, create and return a Python wrapper of it.
    +        Used for ML persistence.
    +        """
    +
    +        # Load information from java_stage to the instance.
    +        bestModel = JavaWrapper._from_java(java_stage.bestModel())
    +        estimator, epms, evaluator = super(TrainValidationSplitModel, cls)._from_java(java_stage)
    +        # Create a new instance of this stage.
    +        py_stage = cls(bestModel=bestModel)\
    +            .setEstimator(estimator).setEstimatorParamMaps(epms).setEvaluator(evaluator)
    --- End diff --
    
    Use _copyValues


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12020#issuecomment-206488969
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12020#discussion_r58596618
  
    --- Diff: python/pyspark/ml/tuning.py ---
    @@ -91,7 +94,90 @@ def build(self):
             return [dict(zip(keys, prod)) for prod in itertools.product(*grid_values)]
     
     
    -class CrossValidator(Estimator, HasSeed):
    +class ValidatorParams(Params):
    +    """
    +    Common params for TrainValidationSplit and CrossValidator.
    +    """
    +
    +    estimator = Param(Params._dummy(), "estimator", "estimator to be cross-validated")
    +    estimatorParamMaps = Param(Params._dummy(), "estimatorParamMaps", "estimator param maps")
    +    evaluator = Param(
    +        Params._dummy(), "evaluator",
    +        "evaluator used to select hyper-parameters that maximize the validator metric")
    +
    +    def __init__(self):
    +        super(ValidatorParams, self).__init__()
    +
    +    def setEstimator(self, value):
    +        """
    +        Sets the value of :py:attr:`estimator`.
    +        """
    +        self._paramMap[self.estimator] = value
    +        return self
    +
    +    def getEstimator(self):
    +        """
    +        Gets the value of estimator or its default value.
    +        """
    +        return self.getOrDefault(self.estimator)
    +
    +    def setEstimatorParamMaps(self, value):
    +        """
    +        Sets the value of :py:attr:`estimatorParamMaps`.
    +        """
    +        self._paramMap[self.estimatorParamMaps] = value
    +        return self
    +
    +    def getEstimatorParamMaps(self):
    +        """
    +        Gets the value of estimatorParamMaps or its default value.
    +        """
    +        return self.getOrDefault(self.estimatorParamMaps)
    +
    +    def setEvaluator(self, value):
    +        """
    +        Sets the value of :py:attr:`evaluator`.
    +        """
    +        self._paramMap[self.evaluator] = value
    +        return self
    +
    +    def getEvaluator(self):
    +        """
    +        Gets the value of evaluator or its default value.
    +        """
    +        return self.getOrDefault(self.evaluator)
    +
    +    @classmethod
    +    def _from_java(cls, java_stage):
    --- End diff --
    
    Rename to _from_java_impl since this is a helper for _from_java methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/12020#issuecomment-206502532
  
    To clarify: I think it is OK to have save/load examples in the doc tests, but if a test has significant validity testing, then it is better to put it in a unit test.
    
    This LGTM
    Thanks a lot @yinxusen  and @holdenk for helping to review!
    Merging with master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12020#discussion_r58596647
  
    --- Diff: python/pyspark/ml/tuning.py ---
    @@ -236,7 +286,10 @@ def _fit(self, dataset):
             else:
                 bestIndex = np.argmin(metrics)
             bestModel = est.fit(dataset, epm[bestIndex])
    -        return CrossValidatorModel(bestModel)
    +        return CrossValidatorModel(bestModel)\
    +            .setEstimator(self.getEstimator())\
    --- End diff --
    
    Use ```_copyValues``` instead.  Same for TrainValidationSplit


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by holdenk <gi...@git.apache.org>.

Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12020#discussion_r58419248
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala ---
    @@ -198,6 +202,11 @@ class TrainValidationSplitModel private[ml] (
         @Since("1.5.0") val validationMetrics: Array[Double])
       extends Model[TrainValidationSplitModel] with TrainValidationSplitParams with MLWritable {
     
    +  /** A Java/Python-friendly auxiliary constructor. */
    --- End diff --
    
    If this is intended to be Java friendly then we probably don't want to make it private to ML scope and we should expand on the javadoc. If it is just for Python it seems this would probably be better suited to PythonMLLibAPI.scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12020#discussion_r58596676
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -73,8 +73,19 @@ def _transfer_params_to_java(self):
             paramMap = self.extractParamMap()
             for param in self.params:
                 if param in paramMap:
    -                pair = self._make_java_param_pair(param, paramMap[param])
    -                self._java_obj.set(pair)
    +                p, v = self._make_java_param_pair(param, paramMap[param])
    +                self._java_obj.set(p.w(v))
    +
    +    def _transfer_extra_params_to_java(self, extra):
    --- End diff --
    
    Rename to _transfer_param_map_to_java, and rename "extra" to "pyParamMap"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12020#issuecomment-206493862
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55126/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12020#issuecomment-203710876
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by holdenk <gi...@git.apache.org>.

Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12020#discussion_r58418833
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala ---
    @@ -17,6 +17,10 @@
     
     package org.apache.spark.ml.tuning
     
    +import java.{util => ju}
    --- End diff --
    
    I think its more common in our current code to do:
    `import java.util.{List => JList}`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12020#discussion_r58596659
  
    --- Diff: python/pyspark/ml/tuning.py ---
    @@ -289,8 +392,60 @@ def copy(self, extra=None):
                 extra = dict()
             return CrossValidatorModel(self.bestModel.copy(extra))
     
    +    @since("2.0.0")
    +    def write(self):
    +        """Returns an JavaMLWriter instance for this ML instance."""
    --- End diff --
    
    "JavaMLWriter"  --> "MLWriter"  (This is an error in the pipeline.py docs too)  Please update this elsewhere in this file as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by holdenk <gi...@git.apache.org>.

Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12020#discussion_r58421680
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
    @@ -200,6 +204,11 @@ class CrossValidatorModel private[ml] (
         @Since("1.5.0") val avgMetrics: Array[Double])
       extends Model[CrossValidatorModel] with CrossValidatorParams with MLWritable {
     
    +  /** A Java/Python-friendly auxiliary constructor. */
    +  private[ml] def this(uid: String, bestModel: Model[_], avgMetrics: ju.List[Double]) = {
    --- End diff --
    
    Similar comment to the one bellow, if this is intended for both Java and Python then we probably don't want it to be private ml scope - but it seems like the intended use of this is Python and this might be better suited to PythonMLLibAPI.scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12020#issuecomment-205188923
  
    **[Test build #2737 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2737/consoleFull)** for PR 12020 at commit [`faba3e5`](https://github.com/apache/spark/commit/faba3e503d0d1d8d1f8a8232b034f7c773282071).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12020#issuecomment-202656688
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12020#issuecomment-206476424
  
    **[Test build #55126 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55126/consoleFull)** for PR 12020 at commit [`f94fd51`](https://github.com/apache/spark/commit/f94fd51642d32675c8cf60d09b33365b05b4b960).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12020#discussion_r58590434
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala ---
    @@ -17,6 +17,10 @@
     
     package org.apache.spark.ml.tuning
     
    +import java.{util => ju}
    --- End diff --
    
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/12020#issuecomment-215845757
  
    Reverting via [https://github.com/apache/spark/pull/12782].  See discussion on [https://github.com/apache/spark/pull/12604].


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12020#discussion_r58596691
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -88,6 +99,18 @@ def _transfer_params_from_java(self):
                         value = _java2py(sc, self._java_obj.getOrDefault(java_param))
                         self._paramMap[param] = value
     
    +    def _transfer_extra_params_from_java(self, extra):
    +        """
    +        Transforms extra params of the instance into a Java ParamMap.
    --- End diff --
    
    "Transforms a Java ParamMap into a Python ParamMap"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12020#discussion_r58602325
  
    --- Diff: python/pyspark/ml/tuning.py ---
    @@ -289,8 +392,60 @@ def copy(self, extra=None):
                 extra = dict()
             return CrossValidatorModel(self.bestModel.copy(extra))
     
    +    @since("2.0.0")
    +    def write(self):
    +        """Returns an JavaMLWriter instance for this ML instance."""
    +        return JavaMLWriter(self)
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Save this ML instance to the given path, a shortcut of `write().save(path)`."""
    +        self.write().save(path)
    +
    +    @classmethod
    +    @since("2.0.0")
    +    def read(cls):
    +        """Returns an MLReader instance for this class."""
    +        return JavaMLReader(cls)
    +
    +    @classmethod
    +    def _from_java(cls, java_stage):
    +        """
    +        Given a Java CrossValidatorModel, create and return a Python wrapper of it.
    +        Used for ML persistence.
    +        """
    +
    +        # Load information from java_stage to the instance.
    +        bestModel = JavaWrapper._from_java(java_stage.bestModel())
    +        estimator, epms, evaluator = super(CrossValidatorModel, cls)._from_java(java_stage)
    +        # Create a new instance of this stage.
    +        py_stage = cls(bestModel=bestModel)\
    +            .setEstimator(estimator).setEstimatorParamMaps(epms).setEvaluator(evaluator)
    --- End diff --
    
    Use _copyValues


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12020#discussion_r58596633
  
    --- Diff: python/pyspark/ml/tuning.py ---
    @@ -112,15 +198,24 @@ class CrossValidator(Estimator, HasSeed):
         >>> cvModel = cv.fit(dataset)
         >>> evaluator.evaluate(cvModel.transform(dataset))
         0.8333...
    +    >>> cvPath = temp_path + "/cv"
    +    >>> cv.save(cvPath)
    +    >>> loadedCV = CrossValidator.load(cvPath)
    +    >>> loadedCV.getEstimator().uid == cv.getEstimator().uid
    --- End diff --
    
    I would recommend not bothering with these comparisons in the doctest since it's part of the documentation.  I would put them (and more) in a unit test.  Same for TrainValidationSplit


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12020#issuecomment-206488670
  
    **[Test build #55125 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55125/consoleFull)** for PR 12020 at commit [`65831f5`](https://github.com/apache/spark/commit/65831f55e3d2e93c2420e5cacfbd3e8cd6896de1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by holdenk <gi...@git.apache.org>.

Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12020#discussion_r58421715
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
    @@ -17,6 +17,10 @@
     
     package org.apache.spark.ml.tuning
     
    +import java.{util => ju}
    --- End diff --
    
    As bellow the import might be more standard the other way


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12020#discussion_r58596611
  
    --- Diff: python/pyspark/ml/tuning.py ---
    @@ -91,7 +94,90 @@ def build(self):
             return [dict(zip(keys, prod)) for prod in itertools.product(*grid_values)]
     
     
    -class CrossValidator(Estimator, HasSeed):
    +class ValidatorParams(Params):
    +    """
    +    Common params for TrainValidationSplit and CrossValidator.
    +    """
    +
    +    estimator = Param(Params._dummy(), "estimator", "estimator to be cross-validated")
    +    estimatorParamMaps = Param(Params._dummy(), "estimatorParamMaps", "estimator param maps")
    +    evaluator = Param(
    +        Params._dummy(), "evaluator",
    +        "evaluator used to select hyper-parameters that maximize the validator metric")
    +
    +    def __init__(self):
    +        super(ValidatorParams, self).__init__()
    +
    +    def setEstimator(self, value):
    +        """
    +        Sets the value of :py:attr:`estimator`.
    +        """
    +        self._paramMap[self.estimator] = value
    --- End diff --
    
    As long as this is being modified, each of these should call ```_set``` rather than accessing ```_paramMap``` directly


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/12020#issuecomment-205950097
  
    I sent most of my comments but am not quite done yet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12020#discussion_r58596688
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -88,6 +99,18 @@ def _transfer_params_from_java(self):
                         value = _java2py(sc, self._java_obj.getOrDefault(java_param))
                         self._paramMap[param] = value
     
    +    def _transfer_extra_params_from_java(self, extra):
    --- End diff --
    
    Same here: Rename to _transfer_param_map_from_java, and rename "extra" to "java_param_map"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12020#issuecomment-203710880
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54573/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12020#discussion_r58596552
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -932,6 +932,11 @@ final class ParamMap private[ml] (private val map: mutable.Map[Param[Any], Any])
         }
       }
     
    +  /** A Java compatible for Python and Java */
    --- End diff --
    
    "Java-friendly method for Python API"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12020


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12020#issuecomment-203710594
  
    **[Test build #54573 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54573/consoleFull)** for PR 12020 at commit [`faba3e5`](https://github.com/apache/spark/commit/faba3e503d0d1d8d1f8a8232b034f7c773282071).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12020#issuecomment-206493859
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12020#discussion_r58596672
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -64,7 +64,7 @@ def _make_java_param_pair(self, param, value):
             param = self._resolveParam(param)
             java_param = self._java_obj.getParam(param.name)
             java_value = _py2java(sc, value)
    -        return java_param.w(java_value)
    +        return java_param, java_value
    --- End diff --
    
    This should not be changed.  I don't think you need to change it for its uses below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12020#issuecomment-206493658
  
    **[Test build #55126 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55126/consoleFull)** for PR 12020 at commit [`f94fd51`](https://github.com/apache/spark/commit/f94fd51642d32675c8cf60d09b33365b05b4b960).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12020#discussion_r58596684
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -73,8 +73,19 @@ def _transfer_params_to_java(self):
             paramMap = self.extractParamMap()
             for param in self.params:
                 if param in paramMap:
    -                pair = self._make_java_param_pair(param, paramMap[param])
    -                self._java_obj.set(pair)
    +                p, v = self._make_java_param_pair(param, paramMap[param])
    +                self._java_obj.set(p.w(v))
    +
    +    def _transfer_extra_params_to_java(self, extra):
    +        """
    +        Transforms extra params of the instance into a Java ParamMap.
    --- End diff --
    
    "Transforms a Python ParamMap into a Java ParamMap"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12020#issuecomment-206488972
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55125/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12020#discussion_r58596564
  
    --- Diff: python/pyspark/ml/tuning.py ---
    @@ -91,7 +94,90 @@ def build(self):
             return [dict(zip(keys, prod)) for prod in itertools.product(*grid_values)]
     
     
    -class CrossValidator(Estimator, HasSeed):
    +class ValidatorParams(Params):
    --- End diff --
    
    This could inherit from HasSeed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12020#discussion_r58590239
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
    @@ -200,6 +204,11 @@ class CrossValidatorModel private[ml] (
         @Since("1.5.0") val avgMetrics: Array[Double])
       extends Model[CrossValidatorModel] with CrossValidatorParams with MLWritable {
     
    +  /** A Java/Python-friendly auxiliary constructor. */
    +  private[ml] def this(uid: String, bestModel: Model[_], avgMetrics: ju.List[Double]) = {
    --- End diff --
    
    We haven't used PythonMLlibAPI.scala for spark.ml yet, so I think it's OK to leave here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12020#issuecomment-202656689
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54391/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13786][ML][Pyspark] Add save/load for p...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12020#discussion_r58596628
  
    --- Diff: python/pyspark/ml/tuning.py ---
    @@ -91,7 +94,90 @@ def build(self):
             return [dict(zip(keys, prod)) for prod in itertools.product(*grid_values)]
     
     
    -class CrossValidator(Estimator, HasSeed):
    +class ValidatorParams(Params):
    +    """
    +    Common params for TrainValidationSplit and CrossValidator.
    +    """
    +
    +    estimator = Param(Params._dummy(), "estimator", "estimator to be cross-validated")
    +    estimatorParamMaps = Param(Params._dummy(), "estimatorParamMaps", "estimator param maps")
    +    evaluator = Param(
    +        Params._dummy(), "evaluator",
    +        "evaluator used to select hyper-parameters that maximize the validator metric")
    +
    +    def __init__(self):
    +        super(ValidatorParams, self).__init__()
    +
    +    def setEstimator(self, value):
    +        """
    +        Sets the value of :py:attr:`estimator`.
    +        """
    +        self._paramMap[self.estimator] = value
    +        return self
    +
    +    def getEstimator(self):
    +        """
    +        Gets the value of estimator or its default value.
    +        """
    +        return self.getOrDefault(self.estimator)
    +
    +    def setEstimatorParamMaps(self, value):
    +        """
    +        Sets the value of :py:attr:`estimatorParamMaps`.
    +        """
    +        self._paramMap[self.estimatorParamMaps] = value
    +        return self
    +
    +    def getEstimatorParamMaps(self):
    +        """
    +        Gets the value of estimatorParamMaps or its default value.
    +        """
    +        return self.getOrDefault(self.estimatorParamMaps)
    +
    +    def setEvaluator(self, value):
    +        """
    +        Sets the value of :py:attr:`evaluator`.
    +        """
    +        self._paramMap[self.evaluator] = value
    +        return self
    +
    +    def getEvaluator(self):
    +        """
    +        Gets the value of evaluator or its default value.
    +        """
    +        return self.getOrDefault(self.evaluator)
    +
    +    @classmethod
    +    def _from_java(cls, java_stage):
    +        """
    +        Return Python estimator, estimatorParamMaps, and evaluator from a Java ValidatorParams.
    +        """
    +
    +        # Load information from java_stage to the instance.
    +        estimator = JavaWrapper._from_java(java_stage.getEstimator())
    +        evaluator = JavaWrapper._from_java(java_stage.getEvaluator())
    +        epms = [estimator._transfer_extra_params_from_java(epm)
    +                for epm in java_stage.getEstimatorParamMaps()]
    +        return estimator, epms, evaluator
    +
    +    def _to_java(self):
    --- End diff --
    
    Rename to _to_java_impl since this is a helper.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org