You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Huaxin Gao (Jira)" <ji...@apache.org> on 2019/11/01 22:32:00 UTC

[jira] [Commented] (SPARK-29691) Estimator fit method fails to copy params (in PySpark)

    [ https://issues.apache.org/jira/browse/SPARK-29691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965128#comment-16965128 ] 

Huaxin Gao commented on SPARK-29691:
------------------------------------

I checked the doc and implementation. The Estimator fits the model using the passed in optional params instead of the embedded params, but it doesn't overwrite the estimator's embedded params values. In your case, the estimator uses 0.75 to fit the model, but it still keeps 0.8 for it's own elasticNetParam. If you get the model's parameters, it should have 0.75 for elasticNetParam. This seems to work as designed. 
# Fit the model, but with an updated parameter setting:lrModel = lr.fit(training, params={lor.elasticNetParam : 0.75})print("After:", lrModel.getOrDefault("elasticNetParam"))  # print 0.75

> Estimator fit method fails to copy params (in PySpark)
> ------------------------------------------------------
>
>                 Key: SPARK-29691
>                 URL: https://issues.apache.org/jira/browse/SPARK-29691
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.4.4
>            Reporter: John Bauer
>            Priority: Minor
>
> Estimator `fit` method is supposed to copy a dictionary of params, overwriting the estimator's previous values, before fitting the model. However, the parameter values are not updated.  This was observed in PySpark, but may be present in the Java objects, as the PySpark code appears to be functioning correctly.   (The copy method that interacts with Java is actually implemented in Params.)
> For example, this prints
> Before: 0.8
> After: 0.8
> but After should be 0.75
> {code:python}
> from pyspark.ml.classification import LogisticRegression
> # Load training data
> training = spark \
>     .read \
>     .format("libsvm") \
>     .load("data/mllib/sample_multiclass_classification_data.txt")
> lr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)
> print("Before:", lr.getOrDefault("elasticNetParam"))
> # Fit the model, but with an updated parameter setting:
> lrModel = lr.fit(training, params={"elasticNetParam" : 0.75})
> print("After:", lr.getOrDefault("elasticNetParam"))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org