You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ruben Janssen (JIRA)" <ji...@apache.org> on 2016/07/18 21:16:20 UTC

[jira] [Commented] (SPARK-9120) Add multivariate regression (or prediction) interface

    [ https://issues.apache.org/jira/browse/SPARK-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383081#comment-15383081 ] 

Ruben Janssen commented on SPARK-9120:
--------------------------------------

Bumping this JIRA because of the recent PR for JIRA https://issues.apache.org/jira/browse/SPARK-10409 which triggered the same discussion. Given 10409 is on the road map for 2.1 (https://issues.apache.org/jira/browse/SPARK-5575), we should keep discussion at one place or at least link this JIRA to 10409. 

Regarding the update on the description which states 'The issue is as follows. RegressionModel extends PredictionModel which has "predict:Double".': this seems to be out of date if I am not missing something. ClassificationModel in ML seems to be extending PredictionModel in the same way RegressionModel does. The initial solution stated therefore seems to be sufficient in case we want to have multivariate regression for all regression algorithms that implement the interface. I am not sure if this is the case however, but if not, I think it would be best to create a separate interface which can then be implemented by algorithms unique (and to keep things consistent, we let ClassificationModel als us to have it: it would not require us to change any code if the naming would be consistent).


> Add multivariate regression (or prediction) interface
> -----------------------------------------------------
>
>                 Key: SPARK-9120
>                 URL: https://issues.apache.org/jira/browse/SPARK-9120
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 1.4.0
>            Reporter: Alexander Ulanov
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> org.apache.spark.ml.regression.RegressionModel supports prediction only for a single variable with a method "predict:Double" by extending the Predictor. There is a need for multivariate prediction, at least for regression. I propose to modify "RegressionModel" interface similarly to how it is done in "ClassificationModel", which supports multiclass classification. It has "predict:Double" and "predictRaw:Vector". Analogously, "RegressionModel" should have something like "predictMultivariate:Vector".
> Update: After reading the design docs, adding "predictMultivariate" to RegressionModel does not seem reasonable to me anymore. The issue is as follows. RegressionModel extends PredictionModel which has "predict:Double". Its "train" method uses "predict:Double" for prediction, i.e. PredictionModel (and RegressionModel) is hard-coded to have only one output. There exist a similar problem in MLLib (https://issues.apache.org/jira/browse/SPARK-5362). 
> The possible solution for this problem might require to redesign the class hierarchy or addition of a separate interface that extends model. Though the latter means code duplication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org