You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by ZacharySBrown <gi...@git.apache.org> on 2015/12/21 22:20:23 UTC

[GitHub] spark pull request: [SPARK-12468] [Pyspark]

GitHub user ZacharySBrown opened a pull request:

    https://github.com/apache/spark/pull/10419

    [SPARK-12468] [Pyspark]

    This addresses an issue where `extractParamMap()` method for a model that has been fit returns an empty dictionary, e.g. (from the [Pyspark ML API Documentation](http://spark.apache.org/docs/latest/ml-guide.html#example-estimator-transformer-and-param)):
    
    ```python
    from pyspark.mllib.linalg import Vectors
    from pyspark.ml.classification import LogisticRegression
    from pyspark.ml.param import Param, Params
    
    # Prepare training data from a list of (label, features) tuples.
    training = sqlContext.createDataFrame([
        (1.0, Vectors.dense([0.0, 1.1, 0.1])),
        (0.0, Vectors.dense([2.0, 1.0, -1.0])),
        (0.0, Vectors.dense([2.0, 1.3, 1.0])),
        (1.0, Vectors.dense([0.0, 1.2, -0.5]))], ["label", "features"])
    
    # Create a LogisticRegression instance. This instance is an Estimator.
    lr = LogisticRegression(maxIter=10, regParam=0.01)
    # Print out the parameters, documentation, and any default values.
    print "LogisticRegression parameters:\n" + lr.explainParams() + "\n"
    
    # Learn a LogisticRegression model. This uses the parameters stored in lr.
    model1 = lr.fit(training)
    
    # Since model1 is a Model (i.e., a transformer produced by an Estimator),
    # we can view the parameters it used during fit().
    # This prints the parameter (name: value) pairs, where names are unique IDs for this
    # LogisticRegression instance.
    print "Model 1 was fit using parameters: "
    print model1.extractParamMap()
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ZacharySBrown/spark Pyspark_extractParamMap_Fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10419.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10419
    
----
commit 6e7c80b6805fc4b6ef4b60c18cb699385ed3bc2e
Author: Zak Brown <za...@cloudera.com>
Date:   2015-12-21T20:28:51Z

    Updated _fit() method of JavaEstimator Class to update paramMap for the returned model

commit 1c5f4998b0bb88dfb7a650525e46853ddbd65ea8
Author: Zak Brown <za...@cloudera.com>
Date:   2015-12-21T21:15:04Z

    Removed extra spaces in modifications to wrapper.py to conform with PEP8 standards

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12468] [Pyspark] extractParamMap return...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on the pull request:

    https://github.com/apache/spark/pull/10419#issuecomment-166795079
  
    Further more, I think we should update the PySpark ML API Documentation which you mentioned. If you want to view the parameters used during ```fit()```, you should call ```model1.parent.extractParamMap()``` rather than ```model1.extractParamMap()```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12468] [Pyspark] extractParamMap return...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on the pull request:

    https://github.com/apache/spark/pull/10419#issuecomment-166794601
  
    @ZacharySBrown Thanks for catching this bug. But I think setting ```a._paramMap``` with ```self.extractParamMap()``` is not appropriate. Because you set the child/Model's ```_paramMap``` with its parent/Estimator's ```_paramMap```, and this has already done at Scala side. I think what we should do is to call [```_transfer_params_from_java```](https://github.com/apache/spark/blob/master/python/pyspark/ml/wrapper.py#L85) which will transforms the embedded params from the companion Java model.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12468] [Pyspark]

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10419#issuecomment-166428318
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12468] [Pyspark] extractParamMap return...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10419#issuecomment-212651937
  
    @ZacharySBrown Thanks for this PR.  I think it's a duplicate of [SPARK-10931], so could you close this PR please?  As @yanboliang mentioned, a good fix will require transferring the Params from Java, which will also require having the Models contain the actual Params.  It would be great to get your input on the other PR.
    
    @chrispe92 There is not a great solution, but you can access the underlying Java object via the _java_obj attribute: ```list(pythonIndexer._java_obj.labels())```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12468] [Pyspark] extractParamMap return...

Posted by chrispe92 <gi...@git.apache.org>.

Github user chrispe92 commented on the pull request:

    https://github.com/apache/spark/pull/10419#issuecomment-199851640
  
    Is there any workaround, until this gets fixed?
    I would like for example to be able and save the parameters used for StringIndexerModel.
    Is this possible?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12468] [Pyspark] extractParamMap return...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10419


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org