You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Mohamed Baddar (JIRA)" <ji...@apache.org> on 2017/03/24 10:35:41 UTC

[jira] [Commented] (SPARK-1548) Add Partial Random Forest algorithm to MLlib

    [ https://issues.apache.org/jira/browse/SPARK-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15940130#comment-15940130 ] 

Mohamed Baddar commented on SPARK-1548:
---------------------------------------

[~manishamde] [~sowen] [~josephkb] 
I have small experience in contributions on starter tasks in spark, and found this issue interesting. I was investigating regarding the partial implementation of RF, and found these resources:

https://mahout.apache.org/users/classification/partial-implementation.html
https://github.com/apache/mahout/blob/b5fe4aab22e7867ae057a6cdb1610cfa17555311/mr/src/main/java/org/apache/mahout/classifier/df/mapreduce/partial/package-info.java

I thinks analyzing mahout implementation provides a good basis to start analyzing RF partial implementation in theory and practically. If this issue is still important to Spark, It would be great if I can start on it. I can start with creating analysis document for current mahout implementation to assess its performance

> Add Partial Random Forest algorithm to MLlib
> --------------------------------------------
>
>                 Key: SPARK-1548
>                 URL: https://issues.apache.org/jira/browse/SPARK-1548
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>    Affects Versions: 1.0.0
>            Reporter: Manish Amde
>
> This task involves creating an alternate approximate random forest implementation where each tree is constructed per partition.
> The tasks involves:
> - Justifying with theory and experimental results why this algorithm is a good choice.
> - Comparing the various tradeoffs and finalizing the algorithm before implementation
> - Code implementation
> - Unit tests
> - Functional tests
> - Performance tests
> - Documentation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org