You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Nick Pentreath (JIRA)" <ji...@apache.org> on 2016/08/01 07:23:20 UTC

[jira] [Commented] (SPARK-16728) migrate internal API for MLlib trees from spark.mllib to spark.ml

    [ https://issues.apache.org/jira/browse/SPARK-16728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401639#comment-15401639 ] 

Nick Pentreath commented on SPARK-16728:
----------------------------------------

cc [~sethah]

> migrate internal API for MLlib trees from spark.mllib to spark.ml
> -----------------------------------------------------------------
>
>                 Key: SPARK-16728
>                 URL: https://issues.apache.org/jira/browse/SPARK-16728
>             Project: Spark
>          Issue Type: Sub-task
>          Components: MLlib
>            Reporter: Vladimir Feinberg
>
> Currently, spark.ml trees rely on spark.mllib implementations. There are two issues with this:
> 1. Spark.ML's GBT TreeBoost algorithm requires storing additional information (the previous ensemble's prediction, for instance) inside the TreePoints (this is necessary to have loss-based splits for complex loss functions).
> 2. The old impurity API only lets you use summary statistics up to the 2nd order. These are useless for several impurity measures and inadequate for others (e.g., absolute loss or huber loss). It needs some renovation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org