You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Siddharth Murching (JIRA)" <ji...@apache.org> on 2017/10/04 23:36:00 UTC

[jira] [Comment Edited] (SPARK-3162) Train DecisionTree locally when possible

    [ https://issues.apache.org/jira/browse/SPARK-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192201#comment-16192201 ] 

Siddharth Murching edited comment on SPARK-3162 at 10/4/17 11:35 PM:
---------------------------------------------------------------------

Commenting here to note that I'd like to resume work on this issue; I've made a new PR^


was (Author: siddharth murching):
Commenting here to note that I'm resuming work on this issue; I've made a new PR^

> Train DecisionTree locally when possible
> ----------------------------------------
>
>                 Key: SPARK-3162
>                 URL: https://issues.apache.org/jira/browse/SPARK-3162
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Joseph K. Bradley
>            Priority: Critical
>
> Improvement: communication
> Currently, every level of a DecisionTree is trained in a distributed manner.  However, at deeper levels in the tree, it is possible that a small set of training data will be matched with any given node.  If the node’s training data can fit on one machine’s memory, it may be more efficient to shuffle the data and do local training for the rest of the subtree rooted at that node.
> Note: It is possible that local training would become possible at different levels in different branches of the tree.  There are multiple options for handling this case:
> (1) Train in a distributed fashion until all remaining nodes can be trained locally.  This would entail training multiple levels at once (locally).
> (2) Train branches locally when possible, and interleave this with distributed training of the other branches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org