You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marek Novotny (JIRA)" <ji...@apache.org> on 2018/06/29 12:34:00 UTC
[jira] [Comment Edited] (SPARK-24165) UDF within when().otherwise()
raises NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-24165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16527556#comment-16527556 ]
Marek Novotny edited comment on SPARK-24165 at 6/29/18 12:33 PM:
-----------------------------------------------------------------
It seems that Spark is not able resolve nullability for nested types correctly.
{{val rows = new util.ArrayList[Row]()}}
{{rows.add(Row(true, ("1", 1)))}}
{{rows.add(Row(false, (null, 2)))}}
{{val schema = StructType(Seq(}}
{{StructField("cond", BooleanType, false),}}
{{StructField("s", StructType(Seq(}}
{{StructField("val1", StringType, true),}}
{{StructField("val2", IntegerType, false)}}
{{)))}}
{{))}}
{{val df = spark.createDataFrame(rows, schema)}}
{{df.select(when('cond, expr("struct('x' as val1, 10 as val2)")).otherwise('s) as "result").printSchema()}}
Result:
{{root}}
{{|-- result: struct (nullable = true)}}
{{| |-- val1: string (nullable = *{color:#ff0000}false{color}*)}}
{{| |-- val2: integer (nullable = false)}}
I will take a look at the problem.
was (Author: mn-mikke):
It seems that Spark is not able resolve nullability for nested types correctly.
{{val rows = new util.ArrayList[Row]()}}
{{rows.add(Row(true, ("1", 1)))}}
{{rows.add(Row(false, (null, 2)))}}
{{val schema = StructType(Seq(}}
{{ StructField("cond", BooleanType, false),}}
{{ StructField("s", StructType(Seq(}}
{{ StructField("val1", StringType, true),}}
{{ StructField("val2", IntegerType, false)}}
{{ )))}}
{{))}}
{{val df = spark.createDataFrame(rows, schema)}}
{{df.select(when('cond, expr("struct('x' as val1, 10 as val2)")).otherwise('s) as "result").printSchema()}}
Result:
{{root}}
{{ |-- result: struct (nullable = true)}}
{{ | |-- val1: string (nullable = *{color:#FF0000}false{color}*)}}
{{ | |-- val2: integer (nullable = false)}}
I will take a look at the problem.
> UDF within when().otherwise() raises NullPointerException
> ---------------------------------------------------------
>
> Key: SPARK-24165
> URL: https://issues.apache.org/jira/browse/SPARK-24165
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.2.0
> Reporter: Jingxuan Wang
> Priority: Major
>
> I have a UDF which takes java.sql.Timestamp and String as input column type and returns an Array of (Seq[case class], Double) as output. Since some of values in input columns can be nullable, I put the UDF inside a when($input.isNull, null).otherwise(UDF) filter. Such function works well when I test in spark shell. But running as a scala jar in spark-submit with yarn cluster mode, it raised NullPointerException which points to the UDF function. If I remove the when().otherwsie() condition, but put null check inside the UDF, the function works without issue in spark-submit.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org