You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "bo song (JIRA)" <ji...@apache.org> on 2016/10/18 09:39:58 UTC

[jira] [Issue Comment Deleted] (SPARK-17987) ML Evaluator fails to handle null values in the dataset

     [ https://issues.apache.org/jira/browse/SPARK-17987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

bo song updated SPARK-17987:
----------------------------
    Comment: was deleted

(was: Thanks for your comments. When evaluation involves in cross validation, there is no way for the caller imputing missing value, the input data set for cross validation has no missing value, but the prediction could be, how to handle this case? 

A simple way to handle a missing value is ignore it before metric computing, that is evaluator excludes any row that contains a missing value.)

> ML Evaluator fails to handle null values in the dataset
> -------------------------------------------------------
>
>                 Key: SPARK-17987
>                 URL: https://issues.apache.org/jira/browse/SPARK-17987
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 1.6.2, 2.0.1
>            Reporter: bo song
>
> Take the RegressionEvaluator as an example, when the predictionCol is null in a row, en exception "scala.MatchEror" will be thrown. The missing null prediction is a common case, for example when an predictor is missing, or its value is out of bound, almost machine learning models could not produce correct predictions, then null predictions would be returned. Evaluators should handle the null values instead of an exception thrown, the common way to handle missing null values is to ignore them. Besides of the null value, the NAN value need to be handled correctly too. 
> Those three evaluators RegressionEvaluator, BinaryClassificationEvaluator and MulticlassClassificationEvaluator have the same problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org