You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/06/17 19:39:29 UTC

[GitHub] [spark] bersprockets opened a new pull request, #36903: [SPARK-39496][SQL] Handle null struct in `Inline.eval`

bersprockets opened a new pull request, #36903:
URL: https://github.com/apache/spark/pull/36903

   ### What changes were proposed in this pull request?
   
   Change `Inline.eval` to return a row of null values rather than a null row in the case of a null input struct.
   
   ### Why are the changes needed?
   
   Consider the following query:
   ```
   set spark.sql.codegen.wholeStage=false;
   select inline(array(named_struct('a', 1, 'b', 2), null));
   ```
   This query fails with a `NullPointerException`:
   ```
   22/06/16 15:10:06 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
   java.lang.NullPointerException
   	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
   	at org.apache.spark.sql.execution.GenerateExec.$anonfun$doExecute$11(GenerateExec.scala:122)
   ```
   (In Spark 3.1.3, you don't need to set `spark.sql.codegen.wholeStage` to false to reproduce the error, since Spark 3.1.3 has no codegen path for `Inline`).
   
   This query fails regardless of the setting of `spark.sql.codegen.wholeStage`:
   ```
   val dfWide = (Seq((1))
     .toDF("col0")
     .selectExpr(Seq.tabulate(99)(x => s"$x as col${x + 1}"): _*))
   
   val df = (dfWide
     .selectExpr("*", "array(named_struct('a', 1, 'b', 2), null) as struct_array"))
   
   df.selectExpr("*", "inline(struct_array)").collect
   ```
   It fails with
   ```
   22/06/16 15:18:55 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)/ 1]
   java.lang.NullPointerException
   	at org.apache.spark.sql.catalyst.expressions.JoinedRow.isNullAt(JoinedRow.scala:80)
   	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_8$(Unknown Source)
   ```
   When `Inline.eval` returns a null row in the collection, GenerateExec gets a NullPointerException either when joining the null row with required child output, or projecting the null row.
   
   This PR avoids producing the null row and produces a row of null values instead:
   ```
   spark-sql> set spark.sql.codegen.wholeStage=false;
   spark.sql.codegen.wholeStage	false
   Time taken: 3.095 seconds, Fetched 1 row(s)
   spark-sql> select inline(array(named_struct('a', 1, 'b', 2), null));
   1	2
   NULL	NULL
   Time taken: 1.214 seconds, Fetched 2 row(s)
   spark-sql>
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New unit test.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #36903: [SPARK-39496][SQL] Handle null struct in `Inline.eval`

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #36903: [SPARK-39496][SQL] Handle null struct in `Inline.eval`
URL: https://github.com/apache/spark/pull/36903


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] bersprockets commented on pull request #36903: [SPARK-39496][SQL] Handle null struct in `Inline.eval`

Posted by GitBox <gi...@apache.org>.
bersprockets commented on PR #36903:
URL: https://github.com/apache/spark/pull/36903#issuecomment-1159327208

   Thanks!
   
   > do we need this for branch-3.1 too (assuming from the JIRA version, yes?) it has a conflict. feel free to create a backport PR.
   
   Will do.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #36903: [SPARK-39496][SQL] Handle null struct in `Inline.eval`

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #36903:
URL: https://github.com/apache/spark/pull/36903#issuecomment-1159317851

   Merged to master, branch-3.3 and branch-3.2.
   
   @bersprockets do we need this for branch-3.1 too (assuming from the JIRA version, yes?) it has a conflict. feel free to create a backport PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org