You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/09/19 17:23:05 UTC

[GitHub] [spark] xkrogen commented on a diff in pull request #37634: [SPARK-40199][SQL] Provide useful error when projecting a non-null column encounters null value

xkrogen commented on code in PR #37634:
URL: https://github.com/apache/spark/pull/37634#discussion_r974499166


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala:
##########
@@ -252,28 +267,44 @@ object GenerateUnsafeProjection extends CodeGenerator[Seq[Expression], UnsafePro
      """.stripMargin
   }
 
+  /**
+   * Wrap `inputExpr` in a try-catch block that will catch any [[NullPointerException]] that is
+   * thrown, instead throwing a (more helpful) error message as provided by
+   * [[org.apache.spark.sql.errors.QueryExecutionErrors.valueCannotBeNullError]].
+   */
+  private def wrapWithNpeHandling(inputExpr: String, descPath: Seq[String]): String =
+    s"""
+       |try {
+       |  ${inputExpr.trim}

Review Comment:
   I prefer exception-catching as it handles this issue with zero overhead. Adding a null-check here essentially falls back to the logic for a nullable schema:
   https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala#L119-L133
   From the benchmark results, we can see that there is nontrivial overhead for the null-check; for the simple case of a projection of a primitive, the overhead is almost 50%:
   https://github.com/apache/spark/blob/2a1f9767213c321bd52e7714fa3b5bfc4973ba40/sql/catalyst/benchmarks/UnsafeProjectionBenchmark-jdk17-results.txt#L9-L10
   
   You call out the situation of a null silently being replaced with a default value; this is a good point. I'm not sure how we can handle that without additional overhead of an explicit check. It seems that the default value replacement logic is coming from [Scala's own unboxing logic](https://github.com/scala/scala/blob/986dcc160aab85298f6cab0bf8dd0345497cdc01/src/library/scala/runtime/BoxesRunTime.java#L102).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org