You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jungtaek Lim (JIRA)" <ji...@apache.org> on 2019/02/26 05:57:00 UTC

[jira] [Commented] (SPARK-26984) Incompatibility between Spark releases - Some(null)

    [ https://issues.apache.org/jira/browse/SPARK-26984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777609#comment-16777609 ] 

Jungtaek Lim commented on SPARK-26984:
--------------------------------------

IMHO, the behavior of Spark 2.2 looks wrong, and current behavior looks correct to me. Suppose how the value should be interpreted in Spark, Some(null) and None, which is literally not same. If `None` falls into null, which is the correct value for `Some(null)`?

I'd recommend to use "Option()" if you're uncertain of the nullability of value.

> Incompatibility between Spark releases - Some(null) 
> ----------------------------------------------------
>
>                 Key: SPARK-26984
>                 URL: https://issues.apache.org/jira/browse/SPARK-26984
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.0
>         Environment: Linux CentOS, Databricks.
>            Reporter: Gerard Alexander
>            Priority: Minor
>              Labels: newbie
>             Fix For: 2.4.1, 2.4.2
>
>
> Please refer to [https://stackoverflow.com/questions/54851205/why-does-somenull-throw-nullpointerexception-in-spark-2-4-but-worked-in-2-2/54861152#54861152.]
> NB: Not sure of priority being correct - no doubt one will evaluate.
> It is noted that the following:
> {{val df = Seq( }}
> {{  (1, Some("a"), Some(1)), }}
> {{  (2, Some(null), Some(2)), }}
> {{  (3, Some("c"), Some(3)), }}
> {{  (4, None, None) ).toDF("c1", "c2", "c3")}}
> In Spark 2.2.1 (on mapr) the Some(null) works fine, in Spark 2.4.0 on Databricks an error ensues.
> {{java.lang.RuntimeException: Error while encoding: java.lang.NullPointerException assertnotnull(assertnotnull(input[0, scala.Tuple3, true]))._1 AS _1#6 staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, unwrapoption(ObjectType(class java.lang.String), assertnotnull(assertnotnull(input[0, scala.Tuple3, true]))._2), true, false) AS _2#7 unwrapoption(IntegerType, assertnotnull(assertnotnull(input[0, scala.Tuple3, true]))._3) AS _3#8 at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:293) at org.apache.spark.sql.SparkSession.$anonfun$createDataset$1(SparkSession.scala:472) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233) at scala.collection.immutable.List.foreach(List.scala:388) at scala.collection.TraversableLike.map(TraversableLike.scala:233) at scala.collection.TraversableLike.map$(TraversableLike.scala:226) at scala.collection.immutable.List.map(List.scala:294) at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:472) at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:377) at org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:228) ... 57 elided Caused by: java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:109) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:289) ... 66 more}}
>  
> You can argue it is solvable otherwise, but there may well be an existing code base that could be affected.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org