You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michael Armbrust (JIRA)" <ji...@apache.org> on 2016/11/04 00:27:59 UTC

[jira] [Commented] (SPARK-18260) from_json can throw a better exception when it can't find the column or be nullSafe

    [ https://issues.apache.org/jira/browse/SPARK-18260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634766#comment-15634766 ] 

Michael Armbrust commented on SPARK-18260:
------------------------------------------

We should return null if the input is null.

> from_json can throw a better exception when it can't find the column or be nullSafe
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-18260
>                 URL: https://issues.apache.org/jira/browse/SPARK-18260
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Burak Yavuz
>            Priority: Blocker
>
> I got this exception:
> {code}
> SparkException: Job aborted due to stage failure: Task 0 in stage 13028.0 failed 4 times, most recent failure: Lost task 0.3 in stage 13028.0 (TID 74170, 10.0.138.84, executor 2): java.lang.NullPointerException
> 	at org.apache.spark.sql.catalyst.expressions.JsonToStruct.eval(jsonExpressions.scala:490)
> 	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.eval(Unknown Source)
> 	at org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$$anonfun$create$2.apply(GeneratePredicate.scala:71)
> 	at org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$$anonfun$create$2.apply(GeneratePredicate.scala:71)
> 	at org.apache.spark.sql.execution.FilterExec$$anonfun$17$$anonfun$apply$2.apply(basicPhysicalOperators.scala:211)
> 	at org.apache.spark.sql.execution.FilterExec$$anonfun$17$$anonfun$apply$2.apply(basicPhysicalOperators.scala:210)
> 	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463)
> 	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
> 	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:804)
> 	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:804)
> {code}
> This was because the column that I called `from_json` on didn't exist for all of my rows. Either from_json should be null safe, or it should fail with a better error message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org