You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Huaxin Gao (Jira)" <ji...@apache.org> on 2019/10/15 16:07:00 UTC
[jira] [Commented] (SPARK-29414) HasOutputCol param isSet()
property is not preserved after persistence
[ https://issues.apache.org/jira/browse/SPARK-29414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952075#comment-16952075 ]
Huaxin Gao commented on SPARK-29414:
------------------------------------
I somehow can't reproduce the problem. In my test, isSet returns false for loaded_model. Could you please try 2.4?
> HasOutputCol param isSet() property is not preserved after persistence
> ----------------------------------------------------------------------
>
> Key: SPARK-29414
> URL: https://issues.apache.org/jira/browse/SPARK-29414
> Project: Spark
> Issue Type: Bug
> Components: ML, PySpark
> Affects Versions: 2.3.2
> Reporter: Borys Biletskyy
> Priority: Major
>
> HasOutputCol param isSet() property is not preserved after saving and loading using DefaultParamsReadable and DefaultParamsWritable.
> {code:java}
> import pytest
> from pyspark import keyword_only
> from pyspark.ml import Model
> from pyspark.sql import DataFrame
> from pyspark.ml.util import DefaultParamsReadable, DefaultParamsWritable
> from pyspark.ml.param.shared import HasInputCol, HasOutputCol
> from pyspark.sql.functions import *
> class HasOutputColTester(Model,
> HasInputCol,
> HasOutputCol,
> DefaultParamsReadable,
> DefaultParamsWritable
> ):
> @keyword_only
> def __init__(self, inputCol: str = None, outputCol: str = None):
> super(HasOutputColTester, self).__init__()
> kwargs = self._input_kwargs
> self.setParams(**kwargs)
> @keyword_only
> def setParams(self, inputCol: str = None, outputCol: str = None):
> kwargs = self._input_kwargs
> self._set(**kwargs)
> return self
> def _transform(self, data: DataFrame) -> DataFrame:
> return data
> class TestHasInputColParam(object):
> def test_persist_input_col_set(self, spark, temp_dir):
> path = temp_dir + '/test_model'
> model = HasOutputColTester()
> assert not model.isDefined(model.inputCol)
> assert not model.isSet(model.inputCol)
> assert model.isDefined(model.outputCol)
> assert not model.isSet(model.outputCol)
> model.write().overwrite().save(path)
> loaded_model: HasOutputColTester = HasOutputColTester.load(path)
> assert not loaded_model.isDefined(model.inputCol)
> assert not loaded_model.isSet(model.inputCol)
> assert loaded_model.isDefined(model.outputCol)
> assert not loaded_model.isSet(model.outputCol) # AssertionError: assert not True
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org