You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2022/04/10 12:50:00 UTC

[jira] [Commented] (SPARK-3723) DecisionTree, RandomForest: Add more instrumentation

    [ https://issues.apache.org/jira/browse/SPARK-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520162#comment-17520162 ] 

Apache Spark commented on SPARK-3723:
-------------------------------------

User 'zero323' has created a pull request for this issue:
https://github.com/apache/spark/pull/36130

> DecisionTree, RandomForest: Add more instrumentation
> ----------------------------------------------------
>
>                 Key: SPARK-3723
>                 URL: https://issues.apache.org/jira/browse/SPARK-3723
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Joseph K. Bradley
>            Priority: Minor
>              Labels: bulk-closed
>
> Some simple instrumentation would help advanced users understand performance, and to check whether parameters (such as maxMemoryInMB) need to be tuned.
> Most important instrumentation (simple):
> * min, avg, max nodes per group
> * number of groups (passes over data)
> More advanced instrumentation:
> * For each tree (or averaged over trees), training set accuracy after training each level.  This would be useful for visualizing learning behavior (to convince oneself that model selection was being done correctly).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org