You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Joseph K. Bradley (JIRA)" <ji...@apache.org> on 2015/05/07 01:33:59 UTC

[jira] [Created] (SPARK-7412) Designing distributed prediction model abstractions for spark.ml

Joseph K. Bradley created SPARK-7412:
----------------------------------------

             Summary: Designing distributed prediction model abstractions for spark.ml
                 Key: SPARK-7412
                 URL: https://issues.apache.org/jira/browse/SPARK-7412
             Project: Spark
          Issue Type: Brainstorming
          Components: ML
            Reporter: Joseph K. Bradley


The Pipelines API (spark.ml package) now includes abstractions for single-label prediction: Predictor, Classifier, Regressor.  These assume models are local, where single-Row prediction methods can be used as UDFs.  We need to think about how to support distributed models in these abstractions.

Should the abstractions be modified somehow?  Or should there be parallel (or inheriting) abstractions, or a mix-in?

Motivation: We may start supporting distributed models since linear models,  random forests, and other models can get large enough to merit distributed storage and computation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org