You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (Jira)" <ji...@apache.org> on 2022/03/31 13:39:00 UTC
[jira] [Resolved] (SPARK-38333) DPP cause DataSourceScanExec java.lang.NullPointerException

     [ https://issues.apache.org/jira/browse/SPARK-38333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenchen Fan resolved SPARK-38333.
---------------------------------
    Fix Version/s: 3.3.0
                   3.2.2
                   3.1.3
       Resolution: Fixed

Issue resolved by pull request 36012
[https://github.com/apache/spark/pull/36012]

> DPP cause DataSourceScanExec java.lang.NullPointerException
> -----------------------------------------------------------
>
>                 Key: SPARK-38333
>                 URL: https://issues.apache.org/jira/browse/SPARK-38333
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.2
>            Reporter: jiahong.li
>            Assignee: jiahong.li
>            Priority: Major
>             Fix For: 3.3.0, 3.2.2, 3.1.3
>
>
> In DPP,we trigger NPE,like blow:
> Caused by: java.lang.NullPointerException
>     at org.apache.spark.sql.execution.DataSourceScanExec.$init$(DataSourceScanExec.scala:57)
>     at org.apache.spark.sql.execution.FileSourceScanExec.<init>(DataSourceScanExec.scala:172)
> ...
>     at org.apache.spark.sql.catalyst.expressions.CodeGeneratorWithInterpretedFallback.createObject(CodeGeneratorWithInterpretedFallback.scala:56)
>     at org.apache.spark.sql.catalyst.expressions.Predicate$.create(predicates.scala:101)
>     at org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$2(basicPhysicalOperators.scala:246)
>     at org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$2$adapted(basicPhysicalOperators.scala:245)
>     at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2(RDD.scala:885)
> ,the root cause is addExprTree funtion in EquivalentExpressions:
> ```
> def addExprTree(
> expr: Expression,
> addFunc: Expression => Boolean = addExpr): Unit = {
> val skip = expr.isInstanceOf[LeafExpression] ||
> // `LambdaVariable` is usually used as a loop variable, which can't be evaluated ahead of the
> // loop. So we can't evaluate sub-expressions containing `LambdaVariable` at the beginning.
> expr.find(_.isInstanceOf[LambdaVariable]).isDefined ||
> // `PlanExpression` wraps query plan. To compare query plans of `PlanExpression` on executor,
> // can cause error like NPE.
> (expr.isInstanceOf[PlanExpression[_]] && TaskContext.get != null)
> if (!skip && !addFunc(expr)) {
> childrenToRecurse(expr).foreach(addExprTree(_, addFunc))
> commonChildrenToRecurse(expr).filter(_.nonEmpty).foreach(addCommonExprs(_, addFunc))
> ```
> maybe we should change it like this :
> ```
> (expr.find(_.isInstanceOf[PlanExpression[_]]).isDefined && TaskContext.get != null)
> ```
> because, in DPP,the filter expression like this:
> DynamicPruningExpression(InSubqueryExec(value, broadcastValues, exprId)
> so, we should iterator children, if PlanExpression found, such as  InSubqueryExec, we should skip addExprTree, then NPE will not appears



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org