You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiangrui Meng (JIRA)" <ji...@apache.org> on 2014/09/19 00:07:35 UTC
[jira] [Comment Edited] (SPARK-3530) Pipeline and Parameters

    [ https://issues.apache.org/jira/browse/SPARK-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139600#comment-14139600 ] 

Xiangrui Meng edited comment on SPARK-3530 at 9/18/14 10:06 PM:
----------------------------------------------------------------

[~eustache] The default implementation of multi-model training will be a for loop. But the API leaves space for future optimizations, like grouping weight vectors and using level-3 BLAS for better performance. It shouldn't be a meta class, because many optimizations are specific. For example, LASSO can be solved via LARS, which computes a full solution path for all regularization parameters. The level-3 BLAS optimization is another example, which can give 8x speedup (SPARK-1486).

[~vrilleup] We can have a set of built-in preconditions, like positivity. Or we could accept lambda function for assertions (T) => Unit, which may be hard for Java users but they should be familiar of creating those in Spark.


was (Author: mengxr):
[~eustache] The default implementation of multi-model training will be a for loop. But the API leaves space for future optimizations, like group weight vectors and using level-3 BLAS for better performance. It shouldn't be a meta class, because many optimizations are specific. For example, LASSO can be solved via LARS, which computes a full solution path for all regularization parameters. The level-3 BLAS optimization is another example, which can give 8x speedup (SPARK-1486).

[~vrilleup] We can have a set of built-in preconditions, like positivity. Or we could accept lambda function for assertions (T) => Unit, which may be hard for Java users but they should be familiar of creating those in Spark.

> Pipeline and Parameters
> -----------------------
>
>                 Key: SPARK-3530
>                 URL: https://issues.apache.org/jira/browse/SPARK-3530
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML, MLlib
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>            Priority: Critical
>
> This part of the design doc is for pipelines and parameters. I put the design doc at
> https://docs.google.com/document/d/1rVwXRjWKfIb-7PI6b86ipytwbUH7irSNLF1_6dLmh8o/edit?usp=sharing
> I will copy the proposed interfaces to this JIRA later. Some sample code can be viewed at: https://github.com/mengxr/spark-ml/
> Please help review the design and post your comments here. Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org