You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Marco Gaido (JIRA)" <ji...@apache.org> on 2018/04/02 10:02:00 UTC

[jira] [Commented] (SPARK-23835) When Dataset.as converts column from nullable to non-nullable type, null Doubles are converted silently to -1

    [ https://issues.apache.org/jira/browse/SPARK-23835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422082#comment-16422082 ] 

Marco Gaido commented on SPARK-23835:
-------------------------------------

Actually this is not the first time we see this. Previously, we said that it was a user error, since if the data is a nullable Double, you should convert it using {{.as[Option[Double]]}}.

Anyway, enforcing this would mean avoiding the conversion of a nullable value to Dobule/Int/etc. (throwing an exception during analysis); but this can break existing users' applications (where maybe null are not present). Or we can eventually asserting there there is no null if we try to convert to primitive type (better than the previous I think).

> When Dataset.as converts column from nullable to non-nullable type, null Doubles are converted silently to -1
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-23835
>                 URL: https://issues.apache.org/jira/browse/SPARK-23835
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Joseph K. Bradley
>            Priority: Major
>
> I constructed a DataFrame with a nullable java.lang.Double column (and an extra Double column).  I then converted it to a Dataset using ```as[(Double, Double)]```.  When the Dataset is shown, it has a null.  When it is collected and printed, the null is silently converted to a -1.
> Code snippet to reproduce this:
> {code}
> val localSpark = spark
> import localSpark.implicits._
> val df = Seq[(java.lang.Double, Double)](
>   (1.0, 2.0),
>   (3.0, 4.0),
>   (Double.NaN, 5.0),
>   (null, 6.0)
> ).toDF("a", "b")
> df.show()  // OUTPUT 1: has null
> df.printSchema()
> val data = df.as[(Double, Double)]
> data.show()  // OUTPUT 2: has null
> data.collect().foreach(println)  // OUTPUT 3: has -1
> {code}
> OUTPUT 1 and 2:
> {code}
> +----+---+
> |   a|  b|
> +----+---+
> | 1.0|2.0|
> | 3.0|4.0|
> | NaN|5.0|
> |null|6.0|
> +----+---+
> {code}
> OUTPUT 3:
> {code}
> (1.0,2.0)
> (3.0,4.0)
> (NaN,5.0)
> (-1.0,6.0)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org