You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by BryanCutler <gi...@git.apache.org> on 2017/05/03 23:02:17 UTC

[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

GitHub user BryanCutler opened a pull request:

    https://github.com/apache/spark/pull/17849

    [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Values from Estimator

    ## What changes were proposed in this pull request?
    
    Added call to copy values of Params from Estimator to Model after fit in PySpark ML.  This will copy values for any params that are also defined in the Model.  Since currently most Models do not define the same params from the Estimator, also added method to create new Params from looking at the Java object if they do not exist in the Python object.  This is a temporary fix that can be removed once the PySpark models properly define the params themselves.
    
    ## How was this patch tested?
    
    Refactored the `check_params` test to optionally check if the model params for Python and Java match and added this check to an existing fitted model that shares params between Estimator and Model.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/BryanCutler/spark pyspark-models-own-params-SPARK-10931

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17849.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17849
    
----
commit a4ede3fff9c012bbd609d3fad13a8c100856f950
Author: Bryan Cutler <cu...@gmail.com>
Date:   2017-03-13T23:43:41Z

    added regression test case for PySpark models not owning params

commit 3b921a469a75b243c97e93548ed3de4bfae040a9
Author: Bryan Cutler <cu...@gmail.com>
Date:   2017-03-13T23:44:22Z

    fixed default PySpark param value that was being overlooked by return instead of continue

commit dff7863d1ecd45a10c225a2ea95cc51814705ed0
Author: Bryan Cutler <cu...@gmail.com>
Date:   2017-03-14T00:29:10Z

    added copy of param values to python model when estimator fit is called

commit 398ef27874e59c615af77b3c838880c68de97e35
Author: Bryan Cutler <cu...@gmail.com>
Date:   2017-03-14T21:10:03Z

    Added temporary fix to add Params when fitting and persisting models

commit 1f3de13e55eb0d56f42104ceb71235da9284ef28
Author: Bryan Cutler <cu...@gmail.com>
Date:   2017-05-02T22:46:49Z

    Merge remote-tracking branch 'upstream/master' into pyspark-models-own-params-SPARK-10931

commit d621c8940421258518345eda41e775b7e65e8e8f
Author: Bryan Cutler <cu...@gmail.com>
Date:   2017-05-02T23:38:07Z

    added check for NaN default param values

commit acdb4b94517f2c740fe53b80ff6870d894a885aa
Author: Bryan Cutler <cu...@gmail.com>
Date:   2017-05-03T22:32:45Z

    need to create params from java when model is fit and unpersisted in order to match

commit 9b7b886125eeb389d48ea398f6305d05b29840c9
Author: Bryan Cutler <cu...@gmail.com>
Date:   2017-05-03T22:40:34Z

    removed blank line

commit 765eb5f77335232eff0889fbc7401f1e77e16dc9
Author: Bryan Cutler <cu...@gmail.com>
Date:   2017-05-03T22:55:37Z

    cleaned old comment block in test

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    **[Test build #80506 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80506/testReport)** for PR 17849 at commit [`07f6e85`](https://github.com/apache/spark/commit/07f6e8594e46106830f3a1e8c7bb66bbaa26bb5a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    @jkbradley @holdenk the heart of this change is just adding the call to `_copyValues` to copy param values from Estimator to Model.  That doesn't really do much though, since most of the Python models do not define any params and there is nothing to copy to.   So I added a temporary little hack to look at the Java Model params after fitting and create any params that don't already exist, then any set values can be copied.  Also needed to do the same after loading a Python model or this will fail persistence tests.
    
    I know having this temporary 'fix' isn't ideal but it would allow us to incrementally add missing Params or restructure class hierarchy to match Scala versions and will continue to copy these values to the Models.  Until that is done, there won't be explicit methods to get each param, such as `getMaxDepth()` but the param value can still be accessed by `param.getOrDefault("maxDepth")` to give users a workaround for all of those type of JIRAs that have come up.  What do you guys think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r132355421
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -135,6 +135,20 @@ def _transfer_param_map_to_java(self, pyParamMap):
                     paramMap.put([pair])
             return paramMap
     
    +    def _create_params_from_java(self):
    +        """
    +        SPARK-10931: Temporary fix to create params that are defined in the Java obj but not here
    +        """
    +        java_params = list(self._java_obj.params())
    +        from pyspark.ml.param import Param
    +        for java_param in java_params:
    +            java_param_name = java_param.name()
    +            if not hasattr(self, java_param_name):
    +                param = Param(self, java_param_name, java_param.doc())
    +                setattr(param, "created_from_java_param", True)
    --- End diff --
    
    BTW, would you mind if I ask where `created_from_java_param` is used? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    ping @jkbradley @holdenk , please have a look when you can, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    **[Test build #80499 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80499/testReport)** for PR 17849 at commit [`f4a657e`](https://github.com/apache/spark/commit/f4a657e08698c07f658f8d23465af899d212d099).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    **[Test build #76429 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76429/testReport)** for PR 17849 at commit [`765eb5f`](https://github.com/apache/spark/commit/765eb5f77335232eff0889fbc7401f1e77e16dc9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    Oh, wait, this looks not requiring ML bit much. Will try to give a pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r132360643
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -1325,7 +1325,7 @@ def __init__(self, featuresCol="features", labelCol="label", predictionCol="pred
             super(MultilayerPerceptronClassifier, self).__init__()
             self._java_obj = self._new_java_obj(
                 "org.apache.spark.ml.classification.MultilayerPerceptronClassifier", self.uid)
    -        self._setDefault(maxIter=100, tol=1E-4, blockSize=128, stepSize=0.03, solver="l-bfgs")
    +        self._setDefault(maxIter=100, tol=1E-6, blockSize=128, stepSize=0.03, solver="l-bfgs")
    --- End diff --
    
    Looks like 1e-6 is correct default value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    **[Test build #80506 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80506/testReport)** for PR 17849 at commit [`07f6e85`](https://github.com/apache/spark/commit/07f6e8594e46106830f3a1e8c7bb66bbaa26bb5a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    I think its good to go for master pending jenkins (it's been awhile since the last run). So let's just make sure everything is still ok: Jenkins retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r114675040
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1355,7 +1370,7 @@ def test_java_params(self):
                 for name, cls in inspect.getmembers(module, inspect.isclass):
                     if not name.endswith('Model') and issubclass(cls, JavaParams)\
                             and not inspect.isabstract(cls):
    -                    self.check_params(cls())
    +                    ParamTests.check_params(self, cls(), check_params_exist=False)
    --- End diff --
    
    Setting `check_params_exist` to True will uncover any params that exist in Java but not in Python


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r132360069
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -417,6 +417,54 @@ def test_logistic_regression_check_thresholds(self):
                 LogisticRegression, threshold=0.42, thresholds=[0.5, 0.5]
             )
     
    +    @staticmethod
    +    def check_params(test_self, py_stage, check_params_exist=True):
    +        """
    +        Checks common requirements for Params.params:
    +          - set of params exist in Java and Python and are ordered by names
    +          - param parent has the same UID as the object's UID
    +          - default param value from Java matches value in Python
    +          - optionally check if all params from Java also exist in Python
    +        """
    +        py_stage_str = "%s %s" % (type(py_stage), py_stage)
    +        if not hasattr(py_stage, "_to_java"):
    +            return
    +        java_stage = py_stage._to_java()
    +        if java_stage is None:
    +            return
    +        test_self.assertEqual(py_stage.uid, java_stage.uid(), msg=py_stage_str)
    +        if check_params_exist:
    +            param_names = [p.name for p in py_stage.params]
    +            java_params = list(java_stage.params())
    +            java_param_names = [jp.name() for jp in java_params]
    +            test_self.assertEqual(
    +                param_names, sorted(java_param_names),
    +                "Param list in Python does not match Java for %s:\nJava = %s\nPython = %s"
    +                % (py_stage_str, java_param_names, param_names))
    --- End diff --
    
    Line 436-443 is the only change to `check_params`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r132520493
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -144,7 +158,9 @@ def _transfer_params_from_java(self):
                 if self._java_obj.hasParam(param.name):
                     java_param = self._java_obj.getParam(param.name)
                     # SPARK-14931: Only check set params back to avoid default params mismatch.
    -                if self._java_obj.isSet(java_param):
    +                if self._java_obj.isSet(java_param) or (
    +                        # SPARK-10931: Temporary fix for params that have a default in Java
    +                        self._java_obj.hasDefault(java_param) and not self.isDefined(param)):
    --- End diff --
    
    True.  I was thinking since this is part of the temporary fix, then it doesn't matter, but it won't be much extra to use `_setDefault` and probably be clearer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r132531194
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1572,7 +1588,8 @@ def test_java_params(self):
                 for name, cls in inspect.getmembers(module, inspect.isclass):
                     if not name.endswith('Model') and issubclass(cls, JavaParams)\
                             and not inspect.isabstract(cls):
    -                    self.check_params(cls())
    +                    # NOTE: disable check_params_exist until there is parity with Scala API
    +                    ParamTests.check_params(self, cls(), check_params_exist=False)
    --- End diff --
    
    Yes, ideally but most of the models need to be trained first so that is why they are skipped here.  Some basic framework would need to be added to allow this, and I'm looking into that as a follow on.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    Merged to master, thanks everyone :) (There is also a follow up JIRA https://issues.apache.org/jira/browse/SPARK-21812 for explicitly defining all of the params in Python).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    Sorry, let me try and take a look tomorrow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r132357684
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -135,6 +135,20 @@ def _transfer_param_map_to_java(self, pyParamMap):
                     paramMap.put([pair])
             return paramMap
     
    +    def _create_params_from_java(self):
    +        """
    +        SPARK-10931: Temporary fix to create params that are defined in the Java obj but not here
    +        """
    +        java_params = list(self._java_obj.params())
    +        from pyspark.ml.param import Param
    +        for java_param in java_params:
    +            java_param_name = java_param.name()
    +            if not hasattr(self, java_param_name):
    --- End diff --
    
    If self contains a same name attribute which is not a `Param`, should we process it like throw exception?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r132530521
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -417,6 +417,54 @@ def test_logistic_regression_check_thresholds(self):
                 LogisticRegression, threshold=0.42, thresholds=[0.5, 0.5]
             )
     
    +    @staticmethod
    +    def check_params(test_self, py_stage, check_params_exist=True):
    +        """
    +        Checks common requirements for Params.params:
    +          - set of params exist in Java and Python and are ordered by names
    +          - param parent has the same UID as the object's UID
    +          - default param value from Java matches value in Python
    +          - optionally check if all params from Java also exist in Python
    +        """
    +        py_stage_str = "%s %s" % (type(py_stage), py_stage)
    +        if not hasattr(py_stage, "_to_java"):
    +            return
    +        java_stage = py_stage._to_java()
    +        if java_stage is None:
    +            return
    +        test_self.assertEqual(py_stage.uid, java_stage.uid(), msg=py_stage_str)
    +        if check_params_exist:
    +            param_names = [p.name for p in py_stage.params]
    +            java_params = list(java_stage.params())
    +            java_param_names = [jp.name() for jp in java_params]
    +            test_self.assertEqual(
    +                param_names, sorted(java_param_names),
    +                "Param list in Python does not match Java for %s:\nJava = %s\nPython = %s"
    +                % (py_stage_str, java_param_names, param_names))
    --- End diff --
    
    I also changed the return to continue on line 454, this loop is checking all params so it was meant to skip over random seed params - not break out of the loop entirely (this is why that default value for MLP was missed).  I cleaned up the NaN checks, before it was just checking for Imputer params, but it should be the same for any params with NaN's as default values.  This is lines 460-462


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r115311169
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -404,6 +404,53 @@ def test_copy_param_extras(self):
             self.assertEqual(tp._paramMap, copied_no_extra)
             self.assertEqual(tp._defaultParamMap, tp_copy._defaultParamMap)
     
    +    @staticmethod
    +    def check_params(test_self, py_stage, check_params_exist=True):
    --- End diff --
    
    no problem!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    Thanks your work on this but I am curious what is the benefit of doing this? In pyspark there is no param in Model itself currently, what is the problem or bugs it can resolve after adding params to pyspark model ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r132361043
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1572,7 +1588,8 @@ def test_java_params(self):
                 for name, cls in inspect.getmembers(module, inspect.isclass):
                     if not name.endswith('Model') and issubclass(cls, JavaParams)\
                             and not inspect.isabstract(cls):
    -                    self.check_params(cls())
    +                    # NOTE: disable check_params_exist until there is parity with Scala API
    +                    ParamTests.check_params(self, cls(), check_params_exist=False)
    --- End diff --
    
    This skips param test for Model. Should we do similar check to all models?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    **[Test build #76429 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76429/testReport)** for PR 17849 at commit [`765eb5f`](https://github.com/apache/spark/commit/765eb5f77335232eff0889fbc7401f1e77e16dc9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    **[Test build #81004 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81004/testReport)** for PR 17849 at commit [`07f6e85`](https://github.com/apache/spark/commit/07f6e8594e46106830f3a1e8c7bb66bbaa26bb5a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r115129846
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -263,7 +282,14 @@ def _fit_java(self, dataset):
     
         def _fit(self, dataset):
             java_model = self._fit_java(dataset)
    -        return self._create_model(java_model)
    +        model = self._create_model(java_model)
    +
    +        # SPARK-10931: This is a temporary fix to allow models to own params
    +        # from estimators. Eventually, these params should be in models through
    +        # using common base classes between estimators and models.
    +        model._create_params_from_java()
    --- End diff --
    
    So right now this would apply to all of the models, would it make sense to make it so that we can selectively move the params forward one at a time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r132358656
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -263,7 +284,8 @@ def _fit_java(self, dataset):
     
         def _fit(self, dataset):
             java_model = self._fit_java(dataset)
    -        return self._create_model(java_model)
    +        model = self._create_model(java_model)
    +        return self._copyValues(model)
    --- End diff --
    
    Here I think it is going to copy values from the estimator to the created model. So I think we assume that the params in estimator and model are the same?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r115310843
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1355,7 +1370,7 @@ def test_java_params(self):
                 for name, cls in inspect.getmembers(module, inspect.isclass):
                     if not name.endswith('Model') and issubclass(cls, JavaParams)\
                             and not inspect.isabstract(cls):
    -                    self.check_params(cls())
    +                    ParamTests.check_params(self, cls(), check_params_exist=False)
    --- End diff --
    
    sure, will do


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r132516315
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -1325,7 +1325,7 @@ def __init__(self, featuresCol="features", labelCol="label", predictionCol="pred
             super(MultilayerPerceptronClassifier, self).__init__()
             self._java_obj = self._new_java_obj(
                 "org.apache.spark.ml.classification.MultilayerPerceptronClassifier", self.uid)
    -        self._setDefault(maxIter=100, tol=1E-4, blockSize=128, stepSize=0.03, solver="l-bfgs")
    +        self._setDefault(maxIter=100, tol=1E-6, blockSize=128, stepSize=0.03, solver="l-bfgs")
    --- End diff --
    
    Yes, the `check_params` test was meant to catch that but was broken


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    This looks pretty reasonable, sorry for the delay. If you have a chance to update this to master would be good to do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    Thanks @holdenk for the review!  I think I wrote the description a little too rushed, so let me clarify a bit... 
    
    The temporary "fix" will just create empty params in the model if they exist in the Java model but not the Python one.  There should be no risk of having these added to the Python model since they are empty when created and not yet defined with a value.  These params will be set in 2 ways: 1) after the model is fit in the call to `_copy_values` where the value is copied from the estimator for any matching params, 2) when the model is loaded there is a call to `_transfer_params_from_java` that will copy value if the the Java param has been explicitly set (I think I need to add something here for the case that the Java model has a default value but Python model doesn't). 
    
    I think the best way forward to get parity with the Scala API is to then organize a JIRA with subtasks to update the Python ML class hierarchies to match the Scala ones, so that the Params will be defined that way with proper "get" and "set" methods too.  It might be good to also have a Python test that checks for matching params in Java for both the estimators and models.  It could be ignored by default and then enabled during the QA period.  The temporary fix here would continue to work and not interfere while the params are being added.  It could be removed once we feel that most of the params have been properly added and close to matching the Scala API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r132522347
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -263,7 +284,8 @@ def _fit_java(self, dataset):
     
         def _fit(self, dataset):
             java_model = self._fit_java(dataset)
    -        return self._create_model(java_model)
    +        model = self._create_model(java_model)
    +        return self._copyValues(model)
    --- End diff --
    
    Yes, that is the assumption and it's the same on the Scala side too.  The estimators and models should both have a shared mixin that defines the common params used.  That's how it's done on the Scala side and Python should follow (once that's done, the temporary fix from here can be removed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r132516813
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -135,6 +135,20 @@ def _transfer_param_map_to_java(self, pyParamMap):
                     paramMap.put([pair])
             return paramMap
     
    +    def _create_params_from_java(self):
    +        """
    +        SPARK-10931: Temporary fix to create params that are defined in the Java obj but not here
    +        """
    +        java_params = list(self._java_obj.params())
    +        from pyspark.ml.param import Param
    +        for java_param in java_params:
    +            java_param_name = java_param.name()
    +            if not hasattr(self, java_param_name):
    +                param = Param(self, java_param_name, java_param.doc())
    +                setattr(param, "created_from_java_param", True)
    --- End diff --
    
    Since this is part of a temporary fix to add Params that are defined in Java but not in Python, then this just adds a tag to the Param in case something goes wrong we will know the param was created here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80506/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    **[Test build #76596 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76596/testReport)** for PR 17849 at commit [`4a66e90`](https://github.com/apache/spark/commit/4a66e90814f14b4a64900f11c0704b83958f0b9a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r114673254
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1308,40 +1357,6 @@ class DefaultValuesTests(PySparkTestCase):
         those in their Scala counterparts.
         """
     
    -    def check_params(self, py_stage):
    -        import pyspark.ml.feature
    -        if not hasattr(py_stage, "_to_java"):
    -            return
    -        java_stage = py_stage._to_java()
    -        if java_stage is None:
    -            return
    -        for p in py_stage.params:
    -            java_param = java_stage.getParam(p.name)
    -            py_has_default = py_stage.hasDefault(p)
    -            java_has_default = java_stage.hasDefault(java_param)
    -            self.assertEqual(py_has_default, java_has_default,
    -                             "Default value mismatch of param %s for Params %s"
    -                             % (p.name, str(py_stage)))
    -            if py_has_default:
    -                if p.name == "seed":
    -                    return  # Random seeds between Spark and PySpark are different
    --- End diff --
    
    this should not return, I changed it to `continue` above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    **[Test build #76596 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76596/testReport)** for PR 17849 at commit [`4a66e90`](https://github.com/apache/spark/commit/4a66e90814f14b4a64900f11c0704b83958f0b9a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r114673096
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -1325,7 +1325,7 @@ def __init__(self, featuresCol="features", labelCol="label", predictionCol="pred
             super(MultilayerPerceptronClassifier, self).__init__()
             self._java_obj = self._new_java_obj(
                 "org.apache.spark.ml.classification.MultilayerPerceptronClassifier", self.uid)
    -        self._setDefault(maxIter=100, tol=1E-4, blockSize=128, stepSize=0.03, solver="l-bfgs")
    +        self._setDefault(maxIter=100, tol=1E-6, blockSize=128, stepSize=0.03, solver="l-bfgs")
    --- End diff --
    
    This is a difference in default values between Python and Java that wasn't being caught because of `check_params` prematurely returning


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    @holdenk , do you think this is good to go now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    ping @holdenk - think this is good to go?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    **[Test build #80499 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80499/testReport)** for PR 17849 at commit [`f4a657e`](https://github.com/apache/spark/commit/f4a657e08698c07f658f8d23465af899d212d099).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    What do you think about this ? @jkbradley


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r115320426
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -263,7 +282,14 @@ def _fit_java(self, dataset):
     
         def _fit(self, dataset):
             java_model = self._fit_java(dataset)
    -        return self._create_model(java_model)
    +        model = self._create_model(java_model)
    +
    +        # SPARK-10931: This is a temporary fix to allow models to own params
    +        # from estimators. Eventually, these params should be in models through
    +        # using common base classes between estimators and models.
    +        model._create_params_from_java()
    --- End diff --
    
    I don't think there is really any downside of just creating all the Params from Java, see my comment below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r115129786
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1355,7 +1370,7 @@ def test_java_params(self):
                 for name, cls in inspect.getmembers(module, inspect.isclass):
                     if not name.endswith('Model') and issubclass(cls, JavaParams)\
                             and not inspect.isabstract(cls):
    -                    self.check_params(cls())
    +                    ParamTests.check_params(self, cls(), check_params_exist=False)
    --- End diff --
    
    This might make sense to include as a comment in the code for whoever is coming to update this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80499/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r114847413
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -263,7 +282,14 @@ def _fit_java(self, dataset):
     
         def _fit(self, dataset):
             java_model = self._fit_java(dataset)
    -        return self._create_model(java_model)
    +        model = self._create_model(java_model)
    +
    +        # SPARK-10931: This is a temporary fix to allow models to own params
    +        # from estimators. Eventually, these params should be in models through
    +        # using common base classes between estimators and models.
    +        model._create_params_from_java()
    --- End diff --
    
    This might be better to move to `JavaModel.__init()` for the case of creating a model without fitting - e.g. `CountVectorizerModel` from vocabulary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    LGTM, its certainly sort of an intermediary fix state but making the params accessible without users having to go through py4j manually is worth while.
    
    I'll leave this over the weekend in case anyone has issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    Thanks @holdenk!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    Thanks for reviewing @viirya and @HyukjinKwon !  
    Btw, the temporary fix I talk about here is an optional addition to this PR to allow users to access model param values this way `decision_tree_model.getOrDefault("maxDepth")` as a workaround until proper accessors (like `getMaxDepth()`) can be added, since I've seen a lot of JIRAs with people asking for this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    **[Test build #80089 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80089/testReport)** for PR 17849 at commit [`4affa01`](https://github.com/apache/spark/commit/4affa019da7fd7d6502a3215a722548d911e3654).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17849


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76596/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r115115239
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -404,6 +404,53 @@ def test_copy_param_extras(self):
             self.assertEqual(tp._paramMap, copied_no_extra)
             self.assertEqual(tp._defaultParamMap, tp_copy._defaultParamMap)
     
    +    @staticmethod
    +    def check_params(test_self, py_stage, check_params_exist=True):
    --- End diff --
    
    Thank so you much for putting in the time on this. :D :D <3


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r132541048
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -144,7 +158,9 @@ def _transfer_params_from_java(self):
                 if self._java_obj.hasParam(param.name):
                     java_param = self._java_obj.getParam(param.name)
                     # SPARK-14931: Only check set params back to avoid default params mismatch.
    -                if self._java_obj.isSet(java_param):
    +                if self._java_obj.isSet(java_param) or (
    +                        # SPARK-10931: Temporary fix for params that have a default in Java
    +                        self._java_obj.hasDefault(java_param) and not self.isDefined(param)):
    --- End diff --
    
    ok, fixed to use _setDefault


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    **[Test build #80089 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80089/testReport)** for PR 17849 at commit [`4affa01`](https://github.com/apache/spark/commit/4affa019da7fd7d6502a3215a722548d911e3654).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r132518262
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -135,6 +135,20 @@ def _transfer_param_map_to_java(self, pyParamMap):
                     paramMap.put([pair])
             return paramMap
     
    +    def _create_params_from_java(self):
    +        """
    +        SPARK-10931: Temporary fix to create params that are defined in the Java obj but not here
    +        """
    +        java_params = list(self._java_obj.params())
    +        from pyspark.ml.param import Param
    +        for java_param in java_params:
    +            java_param_name = java_param.name()
    +            if not hasattr(self, java_param_name):
    --- End diff --
    
    Good point, it's possible that there could be an attribute with that name that is not a param.  If that's the case, then it is probably best to just ignore silently since this is not critical to the model.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76429/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r132359369
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -144,7 +158,9 @@ def _transfer_params_from_java(self):
                 if self._java_obj.hasParam(param.name):
                     java_param = self._java_obj.getParam(param.name)
                     # SPARK-14931: Only check set params back to avoid default params mismatch.
    -                if self._java_obj.isSet(java_param):
    +                if self._java_obj.isSet(java_param) or (
    +                        # SPARK-10931: Temporary fix for params that have a default in Java
    +                        self._java_obj.hasDefault(java_param) and not self.isDefined(param)):
    --- End diff --
    
    This change will make a default value for a param in java side as an user-provided param value in python side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80089/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81004/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    Thanks @holdenk!  Sure, I'll update to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r132330891
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -263,7 +284,8 @@ def _fit_java(self, dataset):
     
         def _fit(self, dataset):
             java_model = self._fit_java(dataset)
    -        return self._create_model(java_model)
    +        model = self._create_model(java_model)
    +        return self._copyValues(model)
    --- End diff --
    
    This is the crucial line being added in this PR.  Without this, if a Python model defines a param (matching one from Scala), then when the model is fit in Scala that param value will never be sent back to Python.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    ping @holdenk , also @HyukjinKwon if you are able to take a look


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    I am rather a backend developer and work together with data scientists. So, my ML knowledge is limited (am studying hard :)). Will leave few comments together if there are some nits and someone starts to review so that they can be addressed together. 
    
    cc @viirya who I believe knows ML bit and @zero323 who I believe should be able to review this (but now is inactive though), are you maybe able to make a pass for this one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    If params are defined in the PySpark model, when that model is fit a Scala version is created then the PySpark model is wrapped around it.  The param values from the Scala version are never transferred to the PySpark model, so the defined params will only have default values.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17849
  
    **[Test build #81004 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81004/testReport)** for PR 17849 at commit [`07f6e85`](https://github.com/apache/spark/commit/07f6e8594e46106830f3a1e8c7bb66bbaa26bb5a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org