You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (Jira)" <ji...@apache.org> on 2022/03/31 13:39:00 UTC
[jira] [Assigned] (SPARK-38333) DPP cause DataSourceScanExec java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-38333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan reassigned SPARK-38333:
-----------------------------------
Assignee: jiahong.li
> DPP cause DataSourceScanExec java.lang.NullPointerException
> -----------------------------------------------------------
>
> Key: SPARK-38333
> URL: https://issues.apache.org/jira/browse/SPARK-38333
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.1.2
> Reporter: jiahong.li
> Assignee: jiahong.li
> Priority: Major
>
> In DPP,we trigger NPE,like blow:
> Caused by: java.lang.NullPointerException
> at org.apache.spark.sql.execution.DataSourceScanExec.$init$(DataSourceScanExec.scala:57)
> at org.apache.spark.sql.execution.FileSourceScanExec.<init>(DataSourceScanExec.scala:172)
> ...
> at org.apache.spark.sql.catalyst.expressions.CodeGeneratorWithInterpretedFallback.createObject(CodeGeneratorWithInterpretedFallback.scala:56)
> at org.apache.spark.sql.catalyst.expressions.Predicate$.create(predicates.scala:101)
> at org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$2(basicPhysicalOperators.scala:246)
> at org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$2$adapted(basicPhysicalOperators.scala:245)
> at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2(RDD.scala:885)
> ,the root cause is addExprTree funtion in EquivalentExpressions:
> ```
> def addExprTree(
> expr: Expression,
> addFunc: Expression => Boolean = addExpr): Unit = {
> val skip = expr.isInstanceOf[LeafExpression] ||
> // `LambdaVariable` is usually used as a loop variable, which can't be evaluated ahead of the
> // loop. So we can't evaluate sub-expressions containing `LambdaVariable` at the beginning.
> expr.find(_.isInstanceOf[LambdaVariable]).isDefined ||
> // `PlanExpression` wraps query plan. To compare query plans of `PlanExpression` on executor,
> // can cause error like NPE.
> (expr.isInstanceOf[PlanExpression[_]] && TaskContext.get != null)
> if (!skip && !addFunc(expr)) {
> childrenToRecurse(expr).foreach(addExprTree(_, addFunc))
> commonChildrenToRecurse(expr).filter(_.nonEmpty).foreach(addCommonExprs(_, addFunc))
> ```
> maybe we should change it like this :
> ```
> (expr.find(_.isInstanceOf[PlanExpression[_]]).isDefined && TaskContext.get != null)
> ```
> because, in DPP,the filter expression like this:
> DynamicPruningExpression(InSubqueryExec(value, broadcastValues, exprId)
> so, we should iterator children, if PlanExpression found, such as InSubqueryExec, we should skip addExprTree, then NPE will not appears
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org