You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Joseph K. Bradley (JIRA)" <ji...@apache.org> on 2015/04/03 19:57:55 UTC

[jira] [Commented] (SPARK-6682) Deprecate static train and use builder instead for Scala/Java

    [ https://issues.apache.org/jira/browse/SPARK-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394776#comment-14394776 ] 

Joseph K. Bradley commented on SPARK-6682:
------------------------------------------

Note: We could keep 2 APIs for Scala/Java, but this is not a great solution for 2 reasons:
* 2 APIs means more code to maintain, and they are confusing to users figuring out which API to use & whether the APIs are the same.
* The static train() methods are not workable for some algorithms with > 10 parameters (because of Scala style constraints).

Also, once we add SparkR, we will not be able to keep uniform APIs everywhere since R has such different syntax.  We can make a best effort, but I feel we should tailor it to the particular language when it makes sense.

> Deprecate static train and use builder instead for Scala/Java
> -------------------------------------------------------------
>
>                 Key: SPARK-6682
>                 URL: https://issues.apache.org/jira/browse/SPARK-6682
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.3.0
>            Reporter: Joseph K. Bradley
>
> In MLlib, we have for some time been unofficially moving away from the old static train() methods and moving towards builder patterns.  This JIRA is to discuss this move and (hopefully) make it official.
> "Old static train()" API:
> {code}
> val myModel = NaiveBayes.train(myData, ...)
> {code}
> "New builder pattern" API:
> {code}
> val nb = new NaiveBayes().setLambda(0.1)
> val myModel = nb.train(myData)
> {code}
> Pros of the builder pattern:
> * Much less code when algorithms have many parameters.  Since Java does not support default arguments, we required *many* duplicated static train() methods (for each prefix set of arguments).
> * Helps to enforce default parameters.  Users should ideally not have to even think about setting parameters if they just want to try an algorithm quickly.
> * Matches spark.ml API
> Cons of the builder pattern:
> * In Python APIs, static train methods are more "Pythonic."
> Proposal:
> * Scala/Java: We should start deprecating the old static train() methods.  We must keep them for API stability, but deprecating will help with API consistency, making it clear that everyone should use the builder pattern.  As we deprecate them, we should make sure that the builder pattern supports all parameters.
> * Python: Keep static train methods.
> CC: [~mengxr]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org