You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tomas Nykodym (JIRA)" <ji...@apache.org> on 2018/02/08 18:08:00 UTC

[jira] [Reopened] (SPARK-23244) Incorrect handling of default values when deserializing python wrappers of scala transformers

     [ https://issues.apache.org/jira/browse/SPARK-23244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tomas Nykodym reopened SPARK-23244:
-----------------------------------

I might be wrong but I don't think this is a duplicate. There is an overlap between the two in that the default values are set as real values, but:
1) I am not sure if it is in the same context. I looked at the PR addressing SPARK-23234 and the changes were in a an unrelated method to my problem. 
2) I don't see how SPARK-23234 addresses how default values based on uid are set after deserialization of JavaTransformer.



> Incorrect handling of default values when deserializing python wrappers of scala transformers
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-23244
>                 URL: https://issues.apache.org/jira/browse/SPARK-23244
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 2.2.1
>            Reporter: Tomas Nykodym
>            Priority: Minor
>
> Default values are not handled properly when serializing/deserializing python trasnformers which are wrappers around scala objects. It looks like that after deserialization the default values which were based on uid do not get properly restored and values which were not set are set to their (original) default values.
> Here's a simple code example using Bucketizer:
> {code:python}
> >>> from pyspark.ml.feature import Bucketizer
> >>> a = Bucketizer() 
> >>> a.save("bucketizer0")
> >>> b = load("bucketizer0") 
> >>> a._defaultParamMap[a.outputCol]
> u'Bucketizer_440bb49206c148989db7__output'
> >>> b._defaultParamMap[b.outputCol]
> u'Bucketizer_41cf9afbc559ca2bfc9a__output'
> >>> a.isSet(a.outputCol)
> False 
> >>> b.isSet(b.outputCol)
> True
> >>> a.getOutputCol()
> u'Bucketizer_440bb49206c148989db7__output'
> >>> b.getOutputCol()
> u'Bucketizer_440bb49206c148989db7__output'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org