You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jinxing64 <gi...@git.apache.org> on 2018/05/24 11:57:31 UTC

[GitHub] spark pull request #21424: [SPARK-24379] BroadcastExchangeExec should catch ...

GitHub user jinxing64 opened a pull request:

    https://github.com/apache/spark/pull/21424

    [SPARK-24379] BroadcastExchangeExec should catch SparkOutOfMemory and re-throw SparkFatalException, which wraps SparkOutOfMemory inside.

    ## What changes were proposed in this pull request?
    
    After https://github.com/apache/spark/pull/20014, Spark won't fails the entire executor but only fails the task suffering `SparkOutOfMemoryError`. After https://github.com/apache/spark/pull/21342,  `BroadcastExchangeExec` try-catch `OutOfMemoryError`. Think about below scenario:
    
    1. `SparkOutOfMemoryError`(subclass of `OutOfMemoryError`) is thrown in `scala.concurrent.Future`;
    2. `SparkOutOfMemoryError` is caught and an `OutOfMemoryError` is wrapped in `SparkFatalException` and re-thrown;
    3. `ThreadUtils.awaitResult` catches `SparkFatalException` and a `OutOfMemoryError` is thrown;
    4. The `OutOfMemoryErro`r will go to `SparkUncaughtExceptionHandler.uncaughtException` and Executor fails.
    So it makes more sense to catch `SparkOutOfMemory` and re-throw `SparkFatalException`, which wraps `SparkOutOfMemory` inside.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jinxing64/spark SPARK-24379

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21424.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21424
    
----
commit aa10470b7b09a100ee80afedb29b24548fbe5512
Author: jinxing <ji...@...>
Date:   2018-05-24T11:51:40Z

    [SPARK-24379] BroadcastExchangeExec should catch SparkOutOfMemory and re-throw SparkFatalException, which wraps SparkOutOfMemory inside.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21424
  
    **[Test build #91105 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91105/testReport)** for PR 21424 at commit [`aa10470`](https://github.com/apache/spark/commit/aa10470b7b09a100ee80afedb29b24548fbe5512).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21424
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3552/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21424
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3621/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21424
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91105/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21424: [SPARK-24379] BroadcastExchangeExec should catch ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21424#discussion_r191104413
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -106,11 +108,20 @@ private[execution] object HashedRelation {
               1),
             0)
         }
    -
    -    if (key.length == 1 && key.head.dataType == LongType) {
    -      LongHashedRelation(input, key, sizeEstimate, mm)
    -    } else {
    -      UnsafeHashedRelation(input, key, sizeEstimate, mm)
    +    try {
    +      if (key.length == 1 && key.head.dataType == LongType) {
    +        LongHashedRelation(input, key, sizeEstimate, mm)
    +      } else {
    +        UnsafeHashedRelation(input, key, sizeEstimate, mm)
    +      }
    +    } catch {
    +      case oe: SparkOutOfMemoryError =>
    +        throw new SparkOutOfMemoryError(s"If this SparkOutOfMemoryError happens in Spark driver," +
    --- End diff --
    
    ah i see. So the `SparkOutOfMemoryError` is thrown by `BytesToBytesMap`, we need to catch and rethrow it to attach the error message anyway.
    
    I also found that we may throw OOM when calling `child.executeCollectIterator` which calls `RDD#collect`, seems the previous code is corrected.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21424
  
    **[Test build #91203 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91203/testReport)** for PR 21424 at commit [`8f224fb`](https://github.com/apache/spark/commit/8f224fbedda0d2e126810918c99291eb395d6bab).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21424: [SPARK-24379] BroadcastExchangeExec should catch ...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 closed the pull request at:

    https://github.com/apache/spark/pull/21424


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21424: [SPARK-24379] BroadcastExchangeExec should catch ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21424#discussion_r191118494
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -106,11 +108,20 @@ private[execution] object HashedRelation {
               1),
             0)
         }
    -
    -    if (key.length == 1 && key.head.dataType == LongType) {
    -      LongHashedRelation(input, key, sizeEstimate, mm)
    -    } else {
    -      UnsafeHashedRelation(input, key, sizeEstimate, mm)
    +    try {
    +      if (key.length == 1 && key.head.dataType == LongType) {
    +        LongHashedRelation(input, key, sizeEstimate, mm)
    +      } else {
    +        UnsafeHashedRelation(input, key, sizeEstimate, mm)
    +      }
    +    } catch {
    +      case oe: SparkOutOfMemoryError =>
    +        throw new SparkOutOfMemoryError(s"If this SparkOutOfMemoryError happens in Spark driver," +
    --- End diff --
    
    it seems we don't need to change anything, maybe just add some comments to say where OOM can occur, i.e. `RDD#collect` and `BroadcastMode#transform`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21424
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21424: [SPARK-24379] BroadcastExchangeExec should catch ...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21424#discussion_r191107018
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -106,11 +108,20 @@ private[execution] object HashedRelation {
               1),
             0)
         }
    -
    -    if (key.length == 1 && key.head.dataType == LongType) {
    -      LongHashedRelation(input, key, sizeEstimate, mm)
    -    } else {
    -      UnsafeHashedRelation(input, key, sizeEstimate, mm)
    +    try {
    +      if (key.length == 1 && key.head.dataType == LongType) {
    +        LongHashedRelation(input, key, sizeEstimate, mm)
    +      } else {
    +        UnsafeHashedRelation(input, key, sizeEstimate, mm)
    +      }
    +    } catch {
    +      case oe: SparkOutOfMemoryError =>
    +        throw new SparkOutOfMemoryError(s"If this SparkOutOfMemoryError happens in Spark driver," +
    --- End diff --
    
    So, I change back ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21424
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21424
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21424
  
    **[Test build #91105 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91105/testReport)** for PR 21424 at commit [`aa10470`](https://github.com/apache/spark/commit/aa10470b7b09a100ee80afedb29b24548fbe5512).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:

    https://github.com/apache/spark/pull/21424
  
    > Broadcast is special because it's run on driver side, so SparkOutOfMemoryError is same as OutOfMemoryError, we need to kill the driver anyway. 
    
    Thanks a lot for deep explanation, I will close this pr and leave it as unchanged :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:

    https://github.com/apache/spark/pull/21424
  
    cc @cloud-fan @JoshRosen 
    Would you please help take a look at this when you have time ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21424
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91204/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21424
  
    **[Test build #91204 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91204/testReport)** for PR 21424 at commit [`3a9669c`](https://github.com/apache/spark/commit/3a9669cff3486aa98cf0a49e69cc0e08d927affd).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21424
  
    I left the JIRA resolved as "Won't fix" given the discussion above.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:

    https://github.com/apache/spark/pull/21424
  
    @cloud-fan 
    > I also found that we may throw OOM
    
    My previous understanding is that Spark throw `SparkOutOfMemoryError` when expect there's no memory -- such expectation is from Spark scope of memory management. So that it's safe to catch `SparkOutOfMemoryError` and mark the corresponding task as failed rather than the executor.
    
    If `OutOfMemoryError` is thrown from JVM, e.g. OOM when `new Array[bigSize]`, is it ok to catch it and continue running the executor as if nothing happened ? If it's ok, does it mean that an executor should never exit when `OutOfMemoryError`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21424
  
    **[Test build #91203 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91203/testReport)** for PR 21424 at commit [`8f224fb`](https://github.com/apache/spark/commit/8f224fbedda0d2e126810918c99291eb395d6bab).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21424
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21424
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91203/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21424
  
    **[Test build #91204 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91204/testReport)** for PR 21424 at commit [`3a9669c`](https://github.com/apache/spark/commit/3a9669cff3486aa98cf0a49e69cc0e08d927affd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21424
  
    That's a good point!
    
    According to the document of `SparkOutOfMemoryError`, Spark should not kill the executor if `SparkOutOfMemoryError` is thrown. Broadcast is special because it's run on driver side, so `SparkOutOfMemoryError` is same as `OutOfMemoryError`, we need to kill the driver anyway.
    
    That said, I think it's fine to catch `OutOfMemoryError` and enhance the error message here, to give users some details about why the job failed.
    
    One thing we can do for future safety: do not change the exception type when enhancing the error message.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21424
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21424: [SPARK-24379] BroadcastExchangeExec should catch ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21424#discussion_r190577287
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala ---
    @@ -115,9 +116,9 @@ case class BroadcastExchangeExec(
               // SPARK-24294: To bypass scala bug: https://github.com/scala/bug/issues/9554, we throw
               // SparkFatalException, which is a subclass of Exception. ThreadUtils.awaitResult
               // will catch this exception and re-throw the wrapped fatal throwable.
    -          case oe: OutOfMemoryError =>
    +          case oe: SparkOutOfMemoryError =>
                 throw new SparkFatalException(
    -              new OutOfMemoryError(s"Not enough memory to build and broadcast the table to " +
    +              new SparkOutOfMemoryError(s"Not enough memory to build and broadcast the table to " +
    --- End diff --
    
    since we fully control the creation of `SparkOutOfMemoryError`, can we move the error message to where we throw `SparkOutOfMemoryError` when building hash relation?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org