You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "yuhao yang (JIRA)" <ji...@apache.org> on 2016/06/01 08:42:59 UTC

[jira] [Commented] (SPARK-15573) Backwards-compatible persistence for spark.ml

    [ https://issues.apache.org/jira/browse/SPARK-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15309946#comment-15309946 ] 

yuhao yang commented on SPARK-15573:
------------------------------------

IMO, this looks more like a release task rather than a CI prerequisite. 
Compared to ut, perhaps we can have a separate SaveLoadTest.scala file which just save/load all the models. Then we can create a corresponding subtask in the QA phase. The assignee can just save models with previous releases and load models with latest version (Shouldn't take long). 

pro: Full coverage, centralized management(easier to maintain), reduce ut time and we don't need to keep the save logic for previous versions.
con: compatibility violation may not be detected immediately.



> Backwards-compatible persistence for spark.ml
> ---------------------------------------------
>
>                 Key: SPARK-15573
>                 URL: https://issues.apache.org/jira/browse/SPARK-15573
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Joseph K. Bradley
>
> This JIRA is for imposing backwards-compatible persistence for the DataFrames-based API for MLlib.  I.e., we want to be able to load models saved in previous versions of Spark.  We will not require loading models saved in later versions of Spark.
> This requires:
> * Putting unit tests in place to check loading models from previous versions
> * Notifying all committers active on MLlib to be aware of this requirement in the future
> The unit tests could be written as in spark.mllib, where we essentially copied and pasted the save() code every time it changed.  This happens rarely, so it should be acceptable, though other designs are fine.
> Subtasks of this JIRA should cover checking and adding tests for existing cases, such as KMeansModel (whose format changed between 1.6 and 2.0).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org