You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Joseph K. Bradley (JIRA)" <ji...@apache.org> on 2016/07/25 21:12:20 UTC

[jira] [Created] (SPARK-16719) RandomForest: communicate fewer trees on each iteration

Joseph K. Bradley created SPARK-16719:
-----------------------------------------

             Summary: RandomForest: communicate fewer trees on each iteration
                 Key: SPARK-16719
                 URL: https://issues.apache.org/jira/browse/SPARK-16719
             Project: Spark
          Issue Type: Improvement
          Components: ML
            Reporter: Joseph K. Bradley
            Assignee: Joseph K. Bradley


RandomForest currently sends the entire forest to each worker on each iteration.  This is because (a) the node queue is FIFO and (b) the closure references the entire array of trees ({{topNodes}}).  (a) causes RFs to handle splits in many trees, especially early on in learning.  (b) sends all trees explicitly.

Proposal:
(a) Change the RF node queue to be FILO, so that RFs tend to focus on 1 or a few trees before focusing on others.
(b) Change topNodes to pass only the trees required on that iteration.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org