You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Joseph K. Bradley (JIRA)" <ji...@apache.org> on 2016/04/13 20:33:25 UTC
[jira] [Resolved] (SPARK-6725) Model export/import for Pipeline API
(Scala)
[ https://issues.apache.org/jira/browse/SPARK-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joseph K. Bradley resolved SPARK-6725.
--------------------------------------
Resolution: Fixed
Fix Version/s: 2.0.0
I'm marking this complete since we now have full coverage. As more features are added, export/import support can be added in separate JIRAs. Thanks everyone for helping so much!
> Model export/import for Pipeline API (Scala)
> --------------------------------------------
>
> Key: SPARK-6725
> URL: https://issues.apache.org/jira/browse/SPARK-6725
> Project: Spark
> Issue Type: Umbrella
> Components: ML
> Affects Versions: 1.3.0
> Reporter: Joseph K. Bradley
> Assignee: Joseph K. Bradley
> Priority: Critical
> Fix For: 2.0.0
>
>
> This is an umbrella JIRA for adding model export/import to the spark.ml API. This JIRA is for adding the internal Saveable/Loadable API and Parquet-based format, not for other formats like PMML.
> This will require the following steps:
> * Add export/import for all PipelineStages supported by spark.ml
> ** This will include some Transformers which are not Models.
> ** These can use almost the same format as the spark.mllib model save/load functions, but the model metadata must store a different class name (marking the class as a spark.ml class).
> * After all PipelineStages support save/load, add an interface which forces future additions to support save/load.
> *UPDATE*: In spark.ml, we could save feature metadata using DataFrames. Other libraries and formats can support this, and it would be great if we could too. We could do either of the following:
> * save() optionally takes a dataset (or schema), and load will return a (model, schema) pair.
> * Models themselves save the input schema.
> Both options would mean inheriting from new Saveable, Loadable types.
> *UPDATE: DESIGN DOC*: Here's a design doc which I wrote. If you have comments about the planned implementation, please comment in this JIRA. Thanks! [https://docs.google.com/document/d/1RleM4QiKwdfZZHf0_G6FBNaF7_koc1Ui7qfMT1pf4IA/edit?usp=sharing]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org