You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/12/29 10:24:58 UTC
[GitHub] [spark] HyukjinKwon opened a new pull request #35058: [SPARK-37779][SQL] Make ColumnarToRowExec plan canonicalizable after (de)serialization
HyukjinKwon opened a new pull request #35058:
URL: https://github.com/apache/spark/pull/35058
### What changes were proposed in this pull request?
This PR proposes to just ignore exceptions by transient fields on the `supportsColumnar` sanity check at `ColumnarToRowExec`.
### Why are the changes needed?
SPARK-23731 fixed the plans to be serializable by leveraging lazy but SPARK-28213 happened to refer to the lazy variable at: https://github.com/apache/spark/blob/77b164aac9764049a4820064421ef82ec0bc14fb/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala#L68
This can fail during canonicalization during, for example, eliminating sub common expressions:
```
java.lang.NullPointerException
at org.apache.spark.sql.execution.FileSourceScanExec.supportsColumnar$lzycompute(DataSourceScanExec.scala:280)
at org.apache.spark.sql.execution.FileSourceScanExec.supportsColumnar(DataSourceScanExec.scala:279)
at org.apache.spark.sql.execution.InputAdapter.supportsColumnar(WholeStageCodegenExec.scala:509)
at org.apache.spark.sql.execution.ColumnarToRowExec.<init>(Columnar.scala:67)
...
at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:581)
at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:580)
at org.apache.spark.sql.execution.ScalarSubquery.canonicalized$lzycompute(subquery.scala:110)
...
at org.apache.spark.sql.catalyst.expressions.ExpressionEquals.hashCode(EquivalentExpressions.scala:275)
...
at scala.collection.mutable.HashTable.findEntry$(HashTable.scala:135)
at scala.collection.mutable.HashMap.findEntry(HashMap.scala:44)
at scala.collection.mutable.HashMap.get(HashMap.scala:74)
at org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.addExpr(EquivalentExpressions.scala:46)
at org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.addExprTreeHelper$1(EquivalentExpressions.scala:147)
at org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.addExprTree(EquivalentExpressions.scala:170)
at org.apache.spark.sql.catalyst.expressions.SubExprEvaluationRuntime.$anonfun$proxyExpressions$1(SubExprEvaluationRuntime.scala:89)
at org.apache.spark.sql.catalyst.expressions.SubExprEvaluationRuntime.$anonfun$proxyExpressions$1$adapted(SubExprEvaluationRuntime.scala:89)
at scala.collection.immutable.List.foreach(List.scala:392)
```
This fix is still a bandaid fix but at least addresses the issue with minimized change. This fix should ideally be backported too.
### Does this PR introduce _any_ user-facing change?
Pretty unlikely - when `ColumnarToRowExec` has to be canonicalized on the executor side (see the stacktrace), but yes. it would fix a bug.
### How was this patch tested?
Unittest was added.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #35058: [SPARK-37779][SQL] Make ColumnarToRowExec plan canonicalizable after (de)serialization
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #35058:
URL: https://github.com/apache/spark/pull/35058#issuecomment-1005941879
Thank you, @HyukjinKwon !
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #35058: [SPARK-37779][SQL] Make ColumnarToRowExec plan canonicalizable after (de)serialization
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #35058:
URL: https://github.com/apache/spark/pull/35058#issuecomment-1002859683
Merged to master, branch-3.2, branch-3.1 and branch-3.0.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #35058: [SPARK-37779][SQL] Make ColumnarToRowExec plan canonicalizable after (de)serialization
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #35058:
URL: https://github.com/apache/spark/pull/35058#issuecomment-1005459708
Sorry, but it seems that this broke branch-3.1 due to some mismatch.
```
[info] SparkPlanSuite:
[info] - SPARK-37779: ColumnarToRowExec should be canonicalizable after being (de)serialized *** FAILED *** (7 seconds, 1 milliseconds)
[info] ColumnarToRowExec was not canonicalizable (SparkPlanSuite.scala:105)
[info] org.scalatest.exceptions.TestFailedException:
...
[info] *** 1 TEST FAILED ***
[error] Failed: Total 1, Failed 1, Errors 0, Passed 0
[error] Failed tests:
[error] org.apache.spark.sql.execution.SparkPlanSuite
[error] (sql / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 378 s (06:18), completed Jan 4, 2022 11:54:14 PM
```
Weirdly, `branch-3.0` is okay. Let me revert this from branch-3.1 first. Could you make a backporting PR to branch-3.1, @HyukjinKwon ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #35058: [SPARK-37779][SQL] Make ColumnarToRowExec plan canonicalizable after (de)serialization
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #35058:
URL: https://github.com/apache/spark/pull/35058#issuecomment-1005459708
Sorry, but it seems that this broke branch-3.1 due to some mismatch.
```
[info] SparkPlanSuite:
[info] - SPARK-37779: ColumnarToRowExec should be canonicalizable after being (de)serialized *** FAILED *** (7 seconds, 1 milliseconds)
[info] ColumnarToRowExec was not canonicalizable (SparkPlanSuite.scala:105)
[info] org.scalatest.exceptions.TestFailedException:
...
[info] *** 1 TEST FAILED ***
[error] Failed: Total 1, Failed 1, Errors 0, Passed 0
[error] Failed tests:
[error] org.apache.spark.sql.execution.SparkPlanSuite
[error] (sql / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 378 s (06:18), completed Jan 4, 2022 11:54:14 PM
```
Weirdly, `branch-3.0` is okay. Let me revert this from branch-3.1 first. Could you make a backporting PR to branch-3.2, @HyukjinKwon ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #35058: [SPARK-37779][SQL] Make ColumnarToRowExec plan canonicalizable after (de)serialization
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #35058:
URL: https://github.com/apache/spark/pull/35058#issuecomment-1005492013
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #35058: [SPARK-37779][SQL] Make ColumnarToRowExec plan canonicalizable after (de)serialization
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #35058:
URL: https://github.com/apache/spark/pull/35058#issuecomment-1002859018
Thanks!
Merged to master.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #35058: [SPARK-37779][SQL] Make ColumnarToRowExec plan canonicalizable after (de)serialization
Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #35058:
URL: https://github.com/apache/spark/pull/35058
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on pull request #35058: [SPARK-37779][SQL] Make ColumnarToRowExec plan canonicalizable after (de)serialization
Posted by GitBox <gi...@apache.org>.
HyukjinKwon edited a comment on pull request #35058:
URL: https://github.com/apache/spark/pull/35058#issuecomment-1005492013
weird .. let me just revert it from branch-3.0 too for now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org