You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Stefan Krawczyk (JIRA)" <ji...@apache.org> on 2016/03/24 18:25:25 UTC

[jira] [Commented] (SPARK-14033) Merging Estimator & Model

    [ https://issues.apache.org/jira/browse/SPARK-14033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15210594#comment-15210594 ] 

Stefan Krawczyk commented on SPARK-14033:
-----------------------------------------

Nitpick: your document mentions MLLib, bur really this about spark.ml, right?

Questions:
1) What is lacking about the current spark documentation that makes this transition/onboarding difficult for users coming from scikit? 
2) Understanding the distinction between MLLib vs spark.ml is confusing at first, do you think this is perhaps part of the problem?
3) Can you give examples about what is unclear about the current semantics? I would argue the main concepts (http://spark.apache.org/docs/latest/ml-guide.html#main-concepts-in-pipelines) are quite crisp. I agree with [~daniel.siegmann.aol] here that this would make things less clear.
4) Wouldn't this proposal make it more complex to maintain code going forward? Since you're more tightly coupling training with prediction code? 

I agree technology adoption is important for an open source project to survive, however I don't think that this proposal will make machine learning simpler to use; the pipeline concept with separate transforms and estimators I think has made good progress to address this very point.

> Merging Estimator & Model
> -------------------------
>
>                 Key: SPARK-14033
>                 URL: https://issues.apache.org/jira/browse/SPARK-14033
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Joseph K. Bradley
>            Assignee: Joseph K. Bradley
>         Attachments: StyleMutabilityMergingEstimatorandModel.pdf
>
>
> This JIRA is for merging the spark.ml concepts of Estimator and Model.
> Goal: Have clearer semantics which match existing libraries (such as scikit-learn).
> For details, please see the linked design doc.  Comment on this JIRA to give feedback on the proposed design.  Once the proposal is discussed and this work is confirmed as ready to proceed, this JIRA will serve as an umbrella for the merge tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org