You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2023/06/08 20:24:05 UTC

[spark] branch branch-3.4 updated: [SPARK-42290][SQL] Fix the OOM error can't be reported when AQE on

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
     new 020eb69722a [SPARK-42290][SQL] Fix the OOM error can't be reported when AQE on
020eb69722a is described below

commit 020eb69722ae6c1e39f14f7855d5d767efc7e499
Author: Jia Fan <fa...@qq.com>
AuthorDate: Thu Jun 8 13:12:45 2023 -0700

    [SPARK-42290][SQL] Fix the OOM error can't be reported when AQE on
    
    ### What changes were proposed in this pull request?
    When we use spark shell to submit job like this:
    ```scala
    $ spark-shell --conf spark.driver.memory=1g
    
    val df = spark.range(5000000).withColumn("str", lit("abcdabcdabcdabcdabasgasdfsadfasdfasdfasfasfsadfasdfsadfasdf"))
    val df2 = spark.range(10).join(broadcast(df), Seq("id"), "left_outer")
    
    df2.collect
    ```
    This will cause the driver to hang indefinitely.
    When we disable AQE, the `java.lang.OutOfMemoryError` will be throws.
    
    After I check the code, the reason are wrong way to use `Throwable::initCause`. It happened when OOM be throw on https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala#L184 . Then https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala#L2401 will be executed.
    It use `new SparkException(..., case=oe).initCause(oe.getCause)`.
    The doc in `Throwable::initCause` say
    ```
    This method can be called at most once. It is generally called from within the constructor,
    or immediately after creating the throwable. If this throwable was created with Throwable(Throwable)
    or Throwable(String, Throwable), this method cannot be called even once.
    ```
    So when we call it, the `IllegalStateException` will be throw. Finally, the `promise.tryFailure(ex)` never be called. The driver will be blocked.
    
    ### Why are the changes needed?
    Fix the OOM never be reported bug
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    Add new test
    
    Closes #41517 from Hisoka-X/SPARK-42290_OOM_AQE_On.
    
    Authored-by: Jia Fan <fa...@qq.com>
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
    (cherry picked from commit 4168e1ac3c1b44298d2c6eae31e7f6cf948614a3)
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
 .../main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala | 2 +-
 .../scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
index 8d79713dd0d..2865ce5492f 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
@@ -2380,7 +2380,7 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase {
         "autoBroadcastjoinThreshold" -> SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key,
         "driverMemory" -> SparkLauncher.DRIVER_MEMORY,
         "analyzeTblMsg" -> analyzeTblMsg),
-      cause = oe).initCause(oe.getCause)
+      cause = oe.getCause)
   }
 
   def executeCodePathUnsupportedError(execName: String): SparkUnsupportedOperationException = {
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
index 27dbe45952e..c0ec8a58bd5 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
@@ -301,6 +301,10 @@ class QueryExecutionErrorsSuite
     }
   }
 
+  test("SPARK-42290: NotEnoughMemory error can't be create") {
+    QueryExecutionErrors.notEnoughMemoryToBuildAndBroadcastTableError(new OutOfMemoryError(), Seq())
+  }
+
   test("UNSUPPORTED_FEATURE - SPARK-38504: can't read TimestampNTZ as TimestampLTZ") {
     withTempPath { file =>
       sql("select timestamp_ntz'2019-03-21 00:02:03'").write.orc(file.getCanonicalPath)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org