You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by sr...@apache.org on 2017/08/10 20:38:06 UTC
spark git commit: [SPARK-21638][ML] Fix RF/GBT Warning message error

Repository: spark
Updated Branches:
  refs/heads/master 95ad960ca -> ca6955858


[SPARK-21638][ML] Fix RF/GBT Warning message error

## What changes were proposed in this pull request?

When train RF model, there are many warning messages like this:

> WARN  RandomForest: Tree learning is using approximately 268492800 bytes per iteration, which exceeds requested limit maxMemoryUsage=268435456. This allows splitting 2622 nodes in this iteration.

This warning message is unnecessary and the data is not accurate.

Actually, if all the nodes cannot split in one iteration, it will show this warning. For most of the case, all the nodes cannot split just in one iteration, so for most of the case, it will show this warning for each iteration.

## How was this patch tested?
The existing UT

Author: Peng Meng <pe...@intel.com>

Closes #18868 from mpjlu/fixRFwarning.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ca695585
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ca695585
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ca695585

Branch: refs/heads/master
Commit: ca6955858cec868c878a2fd8528dbed0ef9edd3f
Parents: 95ad960
Author: Peng Meng <pe...@intel.com>
Authored: Thu Aug 10 21:38:03 2017 +0100
Committer: Sean Owen <so...@cloudera.com>
Committed: Thu Aug 10 21:38:03 2017 +0100

----------------------------------------------------------------------
 .../scala/org/apache/spark/ml/tree/impl/RandomForest.scala  | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/ca695585/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala
----------------------------------------------------------------------
diff --git a/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala b/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala
index 82e1ed8..f7d969f 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala
@@ -1089,7 +1089,8 @@ private[spark] object RandomForest extends Logging {
     var numNodesInGroup = 0
     // If maxMemoryInMB is set very small, we want to still try to split 1 node,
     // so we allow one iteration if memUsage == 0.
-    while (nodeStack.nonEmpty && (memUsage < maxMemoryUsage || memUsage == 0)) {
+    var groupDone = false
+    while (nodeStack.nonEmpty && !groupDone) {
       val (treeIndex, node) = nodeStack.top
       // Choose subset of features for node (if subsampling).
       val featureSubset: Option[Array[Int]] = if (metadata.subsamplingFeatures) {
@@ -1107,9 +1108,11 @@ private[spark] object RandomForest extends Logging {
         mutableTreeToNodeToIndexInfo
           .getOrElseUpdate(treeIndex, new mutable.HashMap[Int, NodeIndexInfo]())(node.id)
           = new NodeIndexInfo(numNodesInGroup, featureSubset)
+        numNodesInGroup += 1
+        memUsage += nodeMemUsage
+      } else {
+        groupDone = true
       }
-      numNodesInGroup += 1
-      memUsage += nodeMemUsage
     }
     if (memUsage > maxMemoryUsage) {
       // If maxMemoryUsage is 0, we should still allow splitting 1 node.


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org