You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "John Bauer (JIRA)" <ji...@apache.org> on 2018/11/09 19:55:00 UTC

[jira] [Comment Edited] (SPARK-21542) Helper functions for custom Python Persistence

    [ https://issues.apache.org/jira/browse/SPARK-21542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16681895#comment-16681895 ] 

John Bauer edited comment on SPARK-21542 at 11/9/18 7:54 PM:
-------------------------------------------------------------

This is a) much more minimal, b) genuinely useful, and c) actually works with save and load, for example:
{code:java}
impute.write().save("impute")
 imp = ImputeNormal.load("impute")
 imp.explainParams()
 impute_model.write().save("impute_model")
 impm = ImputeNormalModel.load("imputer_model")
 impm = ImputeNormalModel.load("impute_model")
 impm.getInputCol()
 impm.getOutputCol()
 impm.getMean()
 impm.getStddev(){code}


was (Author: johnhbauer):
This is a) much more minimal, b) genuinely useful, and c) actually works with save and load, for example:

impute.write().save("impute")
imp = ImputeNormal.load("impute")
imp.explainParams()
impute_model.write().save("impute_model")
impm = ImputeNormalModel.load("imputer_model")
impm = ImputeNormalModel.load("impute_model")
impm.getInputCol()
impm.getOutputCol()
impm.getMean()
impm.getStddev()

> Helper functions for custom Python Persistence
> ----------------------------------------------
>
>                 Key: SPARK-21542
>                 URL: https://issues.apache.org/jira/browse/SPARK-21542
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML, PySpark
>    Affects Versions: 2.2.0
>            Reporter: Ajay Saini
>            Assignee: Ajay Saini
>            Priority: Major
>             Fix For: 2.3.0
>
>
> Currently, there is no way to easily persist Json-serializable parameters in Python only. All parameters in Python are persisted by converting them to Java objects and using the Java persistence implementation. In order to facilitate the creation of custom Python-only pipeline stages, it would be good to have a Python-only persistence framework so that these stages do not need to be implemented in Scala for persistence. 
> This task involves:
> - Adding implementations for DefaultParamsReadable, DefaultParamsWriteable, DefaultParamsReader, and DefaultParamsWriter in pyspark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org