You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Manish Amde (JIRA)" <ji...@apache.org> on 2014/11/03 22:20:34 UTC

[jira] [Commented] (SPARK-4210) Add Extra-Trees algorithm to MLlib

    [ https://issues.apache.org/jira/browse/SPARK-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195108#comment-14195108 ] 

Manish Amde commented on SPARK-4210:
------------------------------------

[~0asa] Thanks for the creating the JIRA.

>From the scikit-learn documentation: "As in random forests, a random subset of candidate features is used, but instead of looking for the most discriminative thresholds, thresholds are drawn at random for each candidate feature and the best of these randomly-generated thresholds is picked as the splitting rule. This usually allows to reduce the variance of the model a bit more, at the expense of a slightly greater increase in bias." This might lead to interesting implementation tradeoffs. Could you please discuss how you plan to implement the findBestSplit method for this.

Also, please note down the related literature (it's a relatively new algorithm) so that people not familiar with this algorithm can understand the suitability of this algorithm for MLlib.

[~mengxr] Could you please assign the ticket to [~0asa]?

> Add Extra-Trees algorithm to MLlib
> ----------------------------------
>
>                 Key: SPARK-4210
>                 URL: https://issues.apache.org/jira/browse/SPARK-4210
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Vincent Botta
>
> This task will add Extra-Trees support to Spark MLlib. The implementation could be inspired from the current Random Forest algorithm. This algorithm is expected to be particularly suited as sorting of attributes is not required as opposed to to the original Random Forest approach (with similar and/or better predictive power). 
> The tasks involves:
> - Code implementation
> - Unit tests
> - Functional tests
> - Performance tests
> - Documentation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org