You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Xiangrui Meng (JIRA)" <ji...@apache.org> on 2016/04/30 05:59:12 UTC

[jira] [Resolved] (SPARK-14311) Model persistence in SparkR 2.0

     [ https://issues.apache.org/jira/browse/SPARK-14311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xiangrui Meng resolved SPARK-14311.
-----------------------------------
    Resolution: Fixed

> Model persistence in SparkR 2.0
> -------------------------------
>
>                 Key: SPARK-14311
>                 URL: https://issues.apache.org/jira/browse/SPARK-14311
>             Project: Spark
>          Issue Type: Umbrella
>          Components: ML, SparkR
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>
> In Spark 2.0, we are going to have 4 ML models in SparkR: GLMs, k-means, naive Bayes, and AFT survival regression. Users can fit models, get summary, and make predictions. However, they cannot save/load the models yet.
> ML models in SparkR are wrappers around ML pipelines. So it should be straightforward to implement model persistence. We need to think more about the API. R uses save/load for objects and datasets (also objects). It is possible to overload save for ML models, e.g., save.NaiveBayesWrapper. But I'm not sure whether load can be overloaded easily. I propose the following API:
> {code}
> model <- glm(formula, data = df)
> ml.save(model, path, mode = "overwrite")
> model2 <- ml.load(path)
> {code}
> We defined wrappers as S4 classes. So `ml.save` is an S4 method and ml.load is a S3 method (correct me if I'm wrong).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org