You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Peter Knight (JIRA)" <ji...@apache.org> on 2018/09/10 16:56:00 UTC

[jira] [Commented] (SPARK-21542) Helper functions for custom Python Persistence

    [ https://issues.apache.org/jira/browse/SPARK-21542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609495#comment-16609495 ] 

Peter Knight commented on SPARK-21542:
--------------------------------------

It would be really helpful to have some example code on how to use these.

I have tried: 
{code}
from pyspark.ml import Transformer
from pyspark.ml.util import DefaultParamsReadable, DefaultParamsWritable

class MedianTrend(Transformer, DefaultParamsReadable, DefaultParamsWritable):
# code here to define Params and transform

# instantiate it
mt1 = MedianTrend(inputColList = ["v1"], outputColList = ["v1_trend_no_reset"], sortCol = "date")

# then save andit
path1 = "test_MedianTrend" 
mt1.write().overwrite().save(path1)

# then load it
mt1_loaded = mt1.load(path1)
df2 = mt1_loaded.transform(df)
df2.show()
{code}
This gives the following error:
{noformat}
'module' object has no attribute 'MedianTrend'{noformat}
 

> Helper functions for custom Python Persistence
> ----------------------------------------------
>
>                 Key: SPARK-21542
>                 URL: https://issues.apache.org/jira/browse/SPARK-21542
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML, PySpark
>    Affects Versions: 2.2.0
>            Reporter: Ajay Saini
>            Assignee: Ajay Saini
>            Priority: Major
>             Fix For: 2.3.0
>
>
> Currently, there is no way to easily persist Json-serializable parameters in Python only. All parameters in Python are persisted by converting them to Java objects and using the Java persistence implementation. In order to facilitate the creation of custom Python-only pipeline stages, it would be good to have a Python-only persistence framework so that these stages do not need to be implemented in Scala for persistence. 
> This task involves:
> - Adding implementations for DefaultParamsReadable, DefaultParamsWriteable, DefaultParamsReader, and DefaultParamsWriter in pyspark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org