You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Mohamed Baddar (JIRA)" <ji...@apache.org> on 2017/03/24 10:35:41 UTC
[jira] [Commented] (SPARK-1548) Add Partial Random Forest algorithm
to MLlib
[ https://issues.apache.org/jira/browse/SPARK-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15940130#comment-15940130 ]
Mohamed Baddar commented on SPARK-1548:
---------------------------------------
[~manishamde] [~sowen] [~josephkb]
I have small experience in contributions on starter tasks in spark, and found this issue interesting. I was investigating regarding the partial implementation of RF, and found these resources:
https://mahout.apache.org/users/classification/partial-implementation.html
https://github.com/apache/mahout/blob/b5fe4aab22e7867ae057a6cdb1610cfa17555311/mr/src/main/java/org/apache/mahout/classifier/df/mapreduce/partial/package-info.java
I thinks analyzing mahout implementation provides a good basis to start analyzing RF partial implementation in theory and practically. If this issue is still important to Spark, It would be great if I can start on it. I can start with creating analysis document for current mahout implementation to assess its performance
> Add Partial Random Forest algorithm to MLlib
> --------------------------------------------
>
> Key: SPARK-1548
> URL: https://issues.apache.org/jira/browse/SPARK-1548
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Affects Versions: 1.0.0
> Reporter: Manish Amde
>
> This task involves creating an alternate approximate random forest implementation where each tree is constructed per partition.
> The tasks involves:
> - Justifying with theory and experimental results why this algorithm is a good choice.
> - Comparing the various tradeoffs and finalizing the algorithm before implementation
> - Code implementation
> - Unit tests
> - Functional tests
> - Performance tests
> - Documentation
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org