You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2023/06/08 20:13:00 UTC

[spark] branch master updated: [SPARK-42290][SQL] Fix the OOM error can't be reported when AQE on

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 4168e1ac3c1 [SPARK-42290][SQL] Fix the OOM error can't be reported when AQE on
4168e1ac3c1 is described below

commit 4168e1ac3c1b44298d2c6eae31e7f6cf948614a3
Author: Jia Fan <fa...@qq.com>
AuthorDate: Thu Jun 8 13:12:45 2023 -0700

    [SPARK-42290][SQL] Fix the OOM error can't be reported when AQE on
    
    ### What changes were proposed in this pull request?
    When we use spark shell to submit job like this:
    ```scala
    $ spark-shell --conf spark.driver.memory=1g
    
    val df = spark.range(5000000).withColumn("str", lit("abcdabcdabcdabcdabasgasdfsadfasdfasdfasfasfsadfasdfsadfasdf"))
    val df2 = spark.range(10).join(broadcast(df), Seq("id"), "left_outer")
    
    df2.collect
    ```
    This will cause the driver to hang indefinitely.
    When we disable AQE, the `java.lang.OutOfMemoryError` will be throws.
    
    After I check the code, the reason are wrong way to use `Throwable::initCause`. It happened when OOM be throw on https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala#L184 . Then https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala#L2401 will be executed.
    It use `new SparkException(..., case=oe).initCause(oe.getCause)`.
    The doc in `Throwable::initCause` say
    ```
    This method can be called at most once. It is generally called from within the constructor,
    or immediately after creating the throwable. If this throwable was created with Throwable(Throwable)
    or Throwable(String, Throwable), this method cannot be called even once.
    ```
    So when we call it, the `IllegalStateException` will be throw. Finally, the `promise.tryFailure(ex)` never be called. The driver will be blocked.
    
    ### Why are the changes needed?
    Fix the OOM never be reported bug
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    Add new test
    
    Closes #41517 from Hisoka-X/SPARK-42290_OOM_AQE_On.
    
    Authored-by: Jia Fan <fa...@qq.com>
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
 .../main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala | 2 +-
 .../scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
index fd09e99b9ee..68243233216 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
@@ -2398,7 +2398,7 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase {
         "autoBroadcastjoinThreshold" -> SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key,
         "driverMemory" -> SparkLauncher.DRIVER_MEMORY,
         "analyzeTblMsg" -> analyzeTblMsg),
-      cause = oe).initCause(oe.getCause)
+      cause = oe.getCause)
   }
 
   def executeCodePathUnsupportedError(execName: String): SparkUnsupportedOperationException = {
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
index 61349c38d2b..069fce237f2 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
@@ -308,6 +308,10 @@ class QueryExecutionErrorsSuite
     }
   }
 
+  test("SPARK-42290: NotEnoughMemory error can't be create") {
+    QueryExecutionErrors.notEnoughMemoryToBuildAndBroadcastTableError(new OutOfMemoryError(), Seq())
+  }
+
   test("UNSUPPORTED_FEATURE - SPARK-38504: can't read TimestampNTZ as TimestampLTZ") {
     withTempPath { file =>
       sql("select timestamp_ntz'2019-03-21 00:02:03'").write.orc(file.getCanonicalPath)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org