You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/12/29 10:24:58 UTC

[GitHub] [spark] HyukjinKwon opened a new pull request #35058: [SPARK-37779][SQL] Make ColumnarToRowExec plan canonicalizable after (de)serialization

HyukjinKwon opened a new pull request #35058:
URL: https://github.com/apache/spark/pull/35058


   ### What changes were proposed in this pull request?
   
   This PR proposes to just ignore exceptions by transient fields on the `supportsColumnar` sanity check at `ColumnarToRowExec`.
   
   ### Why are the changes needed?
   
   SPARK-23731 fixed the plans to be serializable by leveraging lazy but SPARK-28213 happened to refer to the lazy variable at: https://github.com/apache/spark/blob/77b164aac9764049a4820064421ef82ec0bc14fb/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala#L68
   
   This can fail during canonicalization during, for example, eliminating sub common expressions:
   
   ```
   java.lang.NullPointerException
       at org.apache.spark.sql.execution.FileSourceScanExec.supportsColumnar$lzycompute(DataSourceScanExec.scala:280)
       at org.apache.spark.sql.execution.FileSourceScanExec.supportsColumnar(DataSourceScanExec.scala:279)
       at org.apache.spark.sql.execution.InputAdapter.supportsColumnar(WholeStageCodegenExec.scala:509)
       at org.apache.spark.sql.execution.ColumnarToRowExec.<init>(Columnar.scala:67)
       ...
       at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:581)
       at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:580)
       at org.apache.spark.sql.execution.ScalarSubquery.canonicalized$lzycompute(subquery.scala:110)
       ...
       at org.apache.spark.sql.catalyst.expressions.ExpressionEquals.hashCode(EquivalentExpressions.scala:275)
       ...
       at scala.collection.mutable.HashTable.findEntry$(HashTable.scala:135)
       at scala.collection.mutable.HashMap.findEntry(HashMap.scala:44)
       at scala.collection.mutable.HashMap.get(HashMap.scala:74)
       at org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.addExpr(EquivalentExpressions.scala:46)
       at org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.addExprTreeHelper$1(EquivalentExpressions.scala:147)
       at org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.addExprTree(EquivalentExpressions.scala:170)
       at org.apache.spark.sql.catalyst.expressions.SubExprEvaluationRuntime.$anonfun$proxyExpressions$1(SubExprEvaluationRuntime.scala:89)
       at org.apache.spark.sql.catalyst.expressions.SubExprEvaluationRuntime.$anonfun$proxyExpressions$1$adapted(SubExprEvaluationRuntime.scala:89)
       at scala.collection.immutable.List.foreach(List.scala:392)
   ```
   
   This fix is still a bandaid fix but at least addresses the issue with minimized change. This fix should ideally be backported too.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Pretty unlikely - when `ColumnarToRowExec` has to be canonicalized on the executor side (see the stacktrace), but yes. it would fix a bug.
   
   ### How was this patch tested?
   
   Unittest was added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #35058: [SPARK-37779][SQL] Make ColumnarToRowExec plan canonicalizable after (de)serialization

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #35058:
URL: https://github.com/apache/spark/pull/35058#issuecomment-1005941879


   Thank you, @HyukjinKwon !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #35058: [SPARK-37779][SQL] Make ColumnarToRowExec plan canonicalizable after (de)serialization

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #35058:
URL: https://github.com/apache/spark/pull/35058#issuecomment-1002859683


   Merged to master, branch-3.2, branch-3.1 and branch-3.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #35058: [SPARK-37779][SQL] Make ColumnarToRowExec plan canonicalizable after (de)serialization

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #35058:
URL: https://github.com/apache/spark/pull/35058#issuecomment-1005459708


   Sorry, but it seems that this broke branch-3.1 due to some mismatch.
   ```
   [info] SparkPlanSuite:
   [info] - SPARK-37779: ColumnarToRowExec should be canonicalizable after being (de)serialized *** FAILED *** (7 seconds, 1 milliseconds)
   [info]   ColumnarToRowExec was not canonicalizable (SparkPlanSuite.scala:105)
   [info]   org.scalatest.exceptions.TestFailedException:
   ...
   [info] *** 1 TEST FAILED ***
   [error] Failed: Total 1, Failed 1, Errors 0, Passed 0
   [error] Failed tests:
   [error] 	org.apache.spark.sql.execution.SparkPlanSuite
   [error] (sql / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful
   [error] Total time: 378 s (06:18), completed Jan 4, 2022 11:54:14 PM
   ```
   
   Weirdly, `branch-3.0` is okay. Let me revert this from branch-3.1 first. Could you make a backporting PR to branch-3.1, @HyukjinKwon ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #35058: [SPARK-37779][SQL] Make ColumnarToRowExec plan canonicalizable after (de)serialization

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #35058:
URL: https://github.com/apache/spark/pull/35058#issuecomment-1005459708


   Sorry, but it seems that this broke branch-3.1 due to some mismatch.
   ```
   [info] SparkPlanSuite:
   [info] - SPARK-37779: ColumnarToRowExec should be canonicalizable after being (de)serialized *** FAILED *** (7 seconds, 1 milliseconds)
   [info]   ColumnarToRowExec was not canonicalizable (SparkPlanSuite.scala:105)
   [info]   org.scalatest.exceptions.TestFailedException:
   ...
   [info] *** 1 TEST FAILED ***
   [error] Failed: Total 1, Failed 1, Errors 0, Passed 0
   [error] Failed tests:
   [error] 	org.apache.spark.sql.execution.SparkPlanSuite
   [error] (sql / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful
   [error] Total time: 378 s (06:18), completed Jan 4, 2022 11:54:14 PM
   ```
   
   Weirdly, `branch-3.0` is okay. Let me revert this from branch-3.1 first. Could you make a backporting PR to branch-3.2, @HyukjinKwon ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #35058: [SPARK-37779][SQL] Make ColumnarToRowExec plan canonicalizable after (de)serialization

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #35058:
URL: https://github.com/apache/spark/pull/35058#issuecomment-1005492013






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #35058: [SPARK-37779][SQL] Make ColumnarToRowExec plan canonicalizable after (de)serialization

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #35058:
URL: https://github.com/apache/spark/pull/35058#issuecomment-1002859018


   Thanks! 
   
   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #35058: [SPARK-37779][SQL] Make ColumnarToRowExec plan canonicalizable after (de)serialization

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #35058:
URL: https://github.com/apache/spark/pull/35058


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon edited a comment on pull request #35058: [SPARK-37779][SQL] Make ColumnarToRowExec plan canonicalizable after (de)serialization

Posted by GitBox <gi...@apache.org>.
HyukjinKwon edited a comment on pull request #35058:
URL: https://github.com/apache/spark/pull/35058#issuecomment-1005492013


   weird .. let me just revert it from branch-3.0 too for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org