You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by jk...@apache.org on 2015/07/31 02:26:22 UTC
spark git commit: [SPARK-9077] [MLLIB] Improve error message for
decision trees when numExamples < maxCategoriesPerFeature
Repository: spark
Updated Branches:
refs/heads/master 351eda0e2 -> 65fa4181c
[SPARK-9077] [MLLIB] Improve error message for decision trees when numExamples < maxCategoriesPerFeature
Improve error message when number of examples is less than arity of high-arity categorical feature
CC jkbradley is this about what you had in mind? I know it's a starter, but was on my list to close out in the short term.
Author: Sean Owen <so...@cloudera.com>
Closes #7800 from srowen/SPARK-9077 and squashes the following commits:
b8f6cdb [Sean Owen] Improve error message when number of examples is less than arity of high-arity categorical feature
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/65fa4181
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/65fa4181
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/65fa4181
Branch: refs/heads/master
Commit: 65fa4181c35135080870c1e4c1f904ada3a8cf59
Parents: 351eda0
Author: Sean Owen <so...@cloudera.com>
Authored: Thu Jul 30 17:26:18 2015 -0700
Committer: Joseph K. Bradley <jo...@databricks.com>
Committed: Thu Jul 30 17:26:18 2015 -0700
----------------------------------------------------------------------
.../apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/65fa4181/mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala
----------------------------------------------------------------------
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala b/mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala
index 380291a..9fe2646 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala
@@ -128,9 +128,13 @@ private[spark] object DecisionTreeMetadata extends Logging {
// based on the number of training examples.
if (strategy.categoricalFeaturesInfo.nonEmpty) {
val maxCategoriesPerFeature = strategy.categoricalFeaturesInfo.values.max
+ val maxCategory =
+ strategy.categoricalFeaturesInfo.find(_._2 == maxCategoriesPerFeature).get._1
require(maxCategoriesPerFeature <= maxPossibleBins,
- s"DecisionTree requires maxBins (= $maxPossibleBins) >= max categories " +
- s"in categorical features (= $maxCategoriesPerFeature)")
+ s"DecisionTree requires maxBins (= $maxPossibleBins) to be at least as large as the " +
+ s"number of values in each categorical feature, but categorical feature $maxCategory " +
+ s"has $maxCategoriesPerFeature values. Considering remove this and other categorical " +
+ "features with a large number of values, or add more training examples.")
}
val unorderedFeatures = new mutable.HashSet[Int]()
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org