You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Peter Rudenko (JIRA)" <ji...@apache.org> on 2015/04/06 17:56:12 UTC

[jira] [Commented] (SPARK-3702) Standardize MLlib classes for learners, models

    [ https://issues.apache.org/jira/browse/SPARK-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481342#comment-14481342 ] 

Peter Rudenko commented on SPARK-3702:
--------------------------------------

For trees based algorithms curious whether there would be performance benefit by passing directly Dataframe columns rather than single column with vector type. E.g.:

{code}
class GBT extends Estimator with HasInputCols

val model = new GBT.setInputCols("col1","col2", "col3, ...)
{code}





> Standardize MLlib classes for learners, models
> ----------------------------------------------
>
>                 Key: SPARK-3702
>                 URL: https://issues.apache.org/jira/browse/SPARK-3702
>             Project: Spark
>          Issue Type: Sub-task
>          Components: MLlib
>            Reporter: Joseph K. Bradley
>            Assignee: Joseph K. Bradley
>            Priority: Blocker
>
> Summary: Create a class hierarchy for learning algorithms and the models those algorithms produce.
> This is a super-task of several sub-tasks (but JIRA does not allow subtasks of subtasks).  See the "requires" links below for subtasks.
> Goals:
> * give intuitive structure to API, both for developers and for generated documentation
> * support meta-algorithms (e.g., boosting)
> * support generic functionality (e.g., evaluation)
> * reduce code duplication across classes
> [Design doc for class hierarchy | https://docs.google.com/document/d/1BH9el33kBX8JiDdgUJXdLW14CA2qhTCWIG46eXZVoJs]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org