You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Matthew Tovbin (JIRA)" <ji...@apache.org> on 2018/01/18 19:30:00 UTC

[jira] [Created] (SPARK-23152) Invalid guard condition in org.apache.spark.ml.classification.Classifier

Matthew Tovbin created SPARK-23152:
--------------------------------------

             Summary: Invalid guard condition in org.apache.spark.ml.classification.Classifier
                 Key: SPARK-23152
                 URL: https://issues.apache.org/jira/browse/SPARK-23152
             Project: Spark
          Issue Type: Bug
          Components: ML, MLlib
    Affects Versions: 2.1.2, 2.1.1, 2.1.0, 2.0.2, 2.0.1, 2.0.0, 2.1.3, 2.3.0, 2.3.1
            Reporter: Matthew Tovbin


When fitting a classifier that extends "org.apache.spark.ml.classification.Classifier" (NaiveBayes, DecisionTreeClassifier, RandomForestClassifier) a NullPointerException is thrown.

Steps to reproduce:

 
{code:java}
val data = spark.createDataset(Seq.empty[(Double, org.apache.spark.ml.linalg.Vector)])

new DecisionTreeClassifier()
 .setLabelCol("_1").setFeaturesCol("_2").fit(data)
{code}
 

The error:

 
{code:java}
java.lang.NullPointerException: Value at index 0 is null

at org.apache.spark.sql.Row$class.getAnyValAs(Row.scala:472)
at org.apache.spark.sql.Row$class.getDouble(Row.scala:248)
at org.apache.spark.sql.catalyst.expressions.GenericRow.getDouble(rows.scala:165)
at org.apache.spark.ml.classification.Classifier.getNumClasses(Classifier.scala:115)
at org.apache.spark.ml.classification.DecisionTreeClassifier.train(DecisionTreeClassifier.scala:102)
at org.apache.spark.ml.classification.DecisionTreeClassifier.train(DecisionTreeClassifier.scala:45)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:118){code}
 

 

The problem happens due to an incorrect guard condition in org.apache.spark.ml.classification.Classifier:getNumClasses

 
{code:java}
val maxLabelRow: Array[Row] = dataset.select(max($(labelCol))).take(1)
if (maxLabelRow.isEmpty) {
  throw new SparkException("ML algorithm was given empty dataset.")
}
{code}
When the input data is empty the "maxLabelRow" array is not. It contains a single element with no columns in it. Therefore the condition has to be modified to verify that.
{code:java}
maxLabelRow.size <= 1 
{code}
 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org