You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2018/01/24 18:16:00 UTC

[jira] [Resolved] (SPARK-23152) Invalid guard condition in org.apache.spark.ml.classification.Classifier

     [ https://issues.apache.org/jira/browse/SPARK-23152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-23152.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 2.4.0

Issue resolved by pull request 20321
[https://github.com/apache/spark/pull/20321]

> Invalid guard condition in org.apache.spark.ml.classification.Classifier
> ------------------------------------------------------------------------
>
>                 Key: SPARK-23152
>                 URL: https://issues.apache.org/jira/browse/SPARK-23152
>             Project: Spark
>          Issue Type: Bug
>          Components: ML, MLlib
>    Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.3.0, 2.3.1
>            Reporter: Matthew Tovbin
>            Assignee: Matthew Tovbin
>            Priority: Minor
>              Labels: easyfix
>             Fix For: 2.4.0
>
>
> When fitting a classifier that extends "org.apache.spark.ml.classification.Classifier" (NaiveBayes, DecisionTreeClassifier, RandomForestClassifier) a misleading NullPointerException is thrown.
> Steps to reproduce: 
> {code:java}
> val data = spark.createDataset(Seq.empty[(Double, org.apache.spark.ml.linalg.Vector)])
> new DecisionTreeClassifier().setLabelCol("_1").setFeaturesCol("_2").fit(data)
> {code}
>  The error: 
> {code:java}
> java.lang.NullPointerException: Value at index 0 is null
> at org.apache.spark.sql.Row$class.getAnyValAs(Row.scala:472)
> at org.apache.spark.sql.Row$class.getDouble(Row.scala:248)
> at org.apache.spark.sql.catalyst.expressions.GenericRow.getDouble(rows.scala:165)
> at org.apache.spark.ml.classification.Classifier.getNumClasses(Classifier.scala:115)
> at org.apache.spark.ml.classification.DecisionTreeClassifier.train(DecisionTreeClassifier.scala:102)
> at org.apache.spark.ml.classification.DecisionTreeClassifier.train(DecisionTreeClassifier.scala:45)
> at org.apache.spark.ml.Predictor.fit(Predictor.scala:118){code}
>   
> The problem happens due to an incorrect guard condition in function getNumClasses at org.apache.spark.ml.classification.Classifier:106
> {code:java}
> val maxLabelRow: Array[Row] = dataset.select(max($(labelCol))).take(1)
> if (maxLabelRow.isEmpty) {
>   throw new SparkException("ML algorithm was given empty dataset.")
> }
> {code}
> When the input data is empty the result "maxLabelRow" array is not. Instead it contains a single Row(null) element.
>  
> Proposed solution: the condition can be modified to verify that.
> {code:java}
> if (maxLabelRow.isEmpty || maxLabelRow(0).get(0) == null) {
>   throw new SparkException("ML algorithm was given empty dataset.")
> }
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org