You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Joseph K. Bradley (JIRA)" <ji...@apache.org> on 2016/09/09 20:18:20 UTC
[jira] [Commented] (SPARK-15573) Backwards-compatible persistence for spark.ml

    [ https://issues.apache.org/jira/browse/SPARK-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15478131#comment-15478131 ] 

Joseph K. Bradley commented on SPARK-15573:
-------------------------------------------

I'd prefer to put this in unit tests to avoid more manual QA.  Rather than keeping logic around to generate saved models from previous Spark versions, I propose:
* Add a folder {{mllib/src/test/resources/persistence/}} for storing models from each Spark version from now on.
** E.g., for LDA, store a saved instance in {{mllib/src/test/resources/persistence/2.0/clustering/LDA}}
** We should have a standard way to generate these, and new ones should be required whenever a new Params type can be saved/loaded.  Once added, we should be very careful about ever changing the saved resource files.
* Unit tests should test loading all available versions.

We can check to make sure the tests do not take too long and that the resource files are not too large.  (But the files would only be included in the test package, not the main package most people would use.

What do you think?

> Backwards-compatible persistence for spark.ml
> ---------------------------------------------
>
>                 Key: SPARK-15573
>                 URL: https://issues.apache.org/jira/browse/SPARK-15573
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Joseph K. Bradley
>
> This JIRA is for imposing backwards-compatible persistence for the DataFrames-based API for MLlib.  I.e., we want to be able to load models saved in previous versions of Spark.  We will not require loading models saved in later versions of Spark.
> This requires:
> * Putting unit tests in place to check loading models from previous versions
> * Notifying all committers active on MLlib to be aware of this requirement in the future
> The unit tests could be written as in spark.mllib, where we essentially copied and pasted the save() code every time it changed.  This happens rarely, so it should be acceptable, though other designs are fine.
> Subtasks of this JIRA should cover checking and adding tests for existing cases, such as KMeansModel (whose format changed between 1.6 and 2.0).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org