You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Boris Clémençon (JIRA)" <ji...@apache.org> on 2016/04/08 13:27:25 UTC

[jira] [Created] (SPARK-14489) RegressionEvaluator returns NaN for ALS in Spark ml

Boris Clémençon  created SPARK-14489:
----------------------------------------

             Summary: RegressionEvaluator returns NaN for ALS in Spark ml
                 Key: SPARK-14489
                 URL: https://issues.apache.org/jira/browse/SPARK-14489
             Project: Spark
          Issue Type: Bug
          Components: ML
    Affects Versions: 1.6.0
         Environment: AWS EMR
            Reporter: Boris Clémençon 


When building a Spark ML pipeline containing an ALS estimator, the metrics "rmse", "mse", "r2" and "mae" all return NaN. 

The reason is in CrossValidator.scala line 109. The K-folds are randomly generated. For large and sparse datasets, there is a significant probability that at least one user of the validation set is missing in the training set, hence generating a few NaN estimation with transform method and NaN RegressionEvaluator's metrics too. 

Suggestion to fix the bug: remove the NaN values while computing the rmse or other metrics (ie, removing users or items in validation test that is missing in the learning set). Send logs when this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org