You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jinxing64 <gi...@git.apache.org> on 2018/05/24 11:57:31 UTC
[GitHub] spark pull request #21424: [SPARK-24379] BroadcastExchangeExec should catch ...
GitHub user jinxing64 opened a pull request:
https://github.com/apache/spark/pull/21424
[SPARK-24379] BroadcastExchangeExec should catch SparkOutOfMemory and re-throw SparkFatalException, which wraps SparkOutOfMemory inside.
## What changes were proposed in this pull request?
After https://github.com/apache/spark/pull/20014, Spark won't fails the entire executor but only fails the task suffering `SparkOutOfMemoryError`. After https://github.com/apache/spark/pull/21342, `BroadcastExchangeExec` try-catch `OutOfMemoryError`. Think about below scenario:
1. `SparkOutOfMemoryError`(subclass of `OutOfMemoryError`) is thrown in `scala.concurrent.Future`;
2. `SparkOutOfMemoryError` is caught and an `OutOfMemoryError` is wrapped in `SparkFatalException` and re-thrown;
3. `ThreadUtils.awaitResult` catches `SparkFatalException` and a `OutOfMemoryError` is thrown;
4. The `OutOfMemoryErro`r will go to `SparkUncaughtExceptionHandler.uncaughtException` and Executor fails.
So it makes more sense to catch `SparkOutOfMemory` and re-throw `SparkFatalException`, which wraps `SparkOutOfMemory` inside.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jinxing64/spark SPARK-24379
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21424.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21424
----
commit aa10470b7b09a100ee80afedb29b24548fbe5512
Author: jinxing <ji...@...>
Date: 2018-05-24T11:51:40Z
[SPARK-24379] BroadcastExchangeExec should catch SparkOutOfMemory and re-throw SparkFatalException, which wraps SparkOutOfMemory inside.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21424
**[Test build #91105 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91105/testReport)** for PR 21424 at commit [`aa10470`](https://github.com/apache/spark/commit/aa10470b7b09a100ee80afedb29b24548fbe5512).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21424
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3552/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21424
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3621/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21424
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91105/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21424: [SPARK-24379] BroadcastExchangeExec should catch ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/21424#discussion_r191104413
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
@@ -106,11 +108,20 @@ private[execution] object HashedRelation {
1),
0)
}
-
- if (key.length == 1 && key.head.dataType == LongType) {
- LongHashedRelation(input, key, sizeEstimate, mm)
- } else {
- UnsafeHashedRelation(input, key, sizeEstimate, mm)
+ try {
+ if (key.length == 1 && key.head.dataType == LongType) {
+ LongHashedRelation(input, key, sizeEstimate, mm)
+ } else {
+ UnsafeHashedRelation(input, key, sizeEstimate, mm)
+ }
+ } catch {
+ case oe: SparkOutOfMemoryError =>
+ throw new SparkOutOfMemoryError(s"If this SparkOutOfMemoryError happens in Spark driver," +
--- End diff --
ah i see. So the `SparkOutOfMemoryError` is thrown by `BytesToBytesMap`, we need to catch and rethrow it to attach the error message anyway.
I also found that we may throw OOM when calling `child.executeCollectIterator` which calls `RDD#collect`, seems the previous code is corrected.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21424
**[Test build #91203 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91203/testReport)** for PR 21424 at commit [`8f224fb`](https://github.com/apache/spark/commit/8f224fbedda0d2e126810918c99291eb395d6bab).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21424: [SPARK-24379] BroadcastExchangeExec should catch ...
Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 closed the pull request at:
https://github.com/apache/spark/pull/21424
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21424: [SPARK-24379] BroadcastExchangeExec should catch ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/21424#discussion_r191118494
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
@@ -106,11 +108,20 @@ private[execution] object HashedRelation {
1),
0)
}
-
- if (key.length == 1 && key.head.dataType == LongType) {
- LongHashedRelation(input, key, sizeEstimate, mm)
- } else {
- UnsafeHashedRelation(input, key, sizeEstimate, mm)
+ try {
+ if (key.length == 1 && key.head.dataType == LongType) {
+ LongHashedRelation(input, key, sizeEstimate, mm)
+ } else {
+ UnsafeHashedRelation(input, key, sizeEstimate, mm)
+ }
+ } catch {
+ case oe: SparkOutOfMemoryError =>
+ throw new SparkOutOfMemoryError(s"If this SparkOutOfMemoryError happens in Spark driver," +
--- End diff --
it seems we don't need to change anything, maybe just add some comments to say where OOM can occur, i.e. `RDD#collect` and `BroadcastMode#transform`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21424
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21424: [SPARK-24379] BroadcastExchangeExec should catch ...
Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21424#discussion_r191107018
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
@@ -106,11 +108,20 @@ private[execution] object HashedRelation {
1),
0)
}
-
- if (key.length == 1 && key.head.dataType == LongType) {
- LongHashedRelation(input, key, sizeEstimate, mm)
- } else {
- UnsafeHashedRelation(input, key, sizeEstimate, mm)
+ try {
+ if (key.length == 1 && key.head.dataType == LongType) {
+ LongHashedRelation(input, key, sizeEstimate, mm)
+ } else {
+ UnsafeHashedRelation(input, key, sizeEstimate, mm)
+ }
+ } catch {
+ case oe: SparkOutOfMemoryError =>
+ throw new SparkOutOfMemoryError(s"If this SparkOutOfMemoryError happens in Spark driver," +
--- End diff --
So, I change back ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21424
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21424
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21424
**[Test build #91105 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91105/testReport)** for PR 21424 at commit [`aa10470`](https://github.com/apache/spark/commit/aa10470b7b09a100ee80afedb29b24548fbe5512).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...
Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/21424
> Broadcast is special because it's run on driver side, so SparkOutOfMemoryError is same as OutOfMemoryError, we need to kill the driver anyway.
Thanks a lot for deep explanation, I will close this pr and leave it as unchanged :)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...
Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/21424
cc @cloud-fan @JoshRosen
Would you please help take a look at this when you have time ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21424
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91204/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21424
**[Test build #91204 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91204/testReport)** for PR 21424 at commit [`3a9669c`](https://github.com/apache/spark/commit/3a9669cff3486aa98cf0a49e69cc0e08d927affd).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21424
I left the JIRA resolved as "Won't fix" given the discussion above.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...
Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/21424
@cloud-fan
> I also found that we may throw OOM
My previous understanding is that Spark throw `SparkOutOfMemoryError` when expect there's no memory -- such expectation is from Spark scope of memory management. So that it's safe to catch `SparkOutOfMemoryError` and mark the corresponding task as failed rather than the executor.
If `OutOfMemoryError` is thrown from JVM, e.g. OOM when `new Array[bigSize]`, is it ok to catch it and continue running the executor as if nothing happened ? If it's ok, does it mean that an executor should never exit when `OutOfMemoryError`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21424
**[Test build #91203 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91203/testReport)** for PR 21424 at commit [`8f224fb`](https://github.com/apache/spark/commit/8f224fbedda0d2e126810918c99291eb395d6bab).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21424
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21424
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91203/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21424
**[Test build #91204 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91204/testReport)** for PR 21424 at commit [`3a9669c`](https://github.com/apache/spark/commit/3a9669cff3486aa98cf0a49e69cc0e08d927affd).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/21424
That's a good point!
According to the document of `SparkOutOfMemoryError`, Spark should not kill the executor if `SparkOutOfMemoryError` is thrown. Broadcast is special because it's run on driver side, so `SparkOutOfMemoryError` is same as `OutOfMemoryError`, we need to kill the driver anyway.
That said, I think it's fine to catch `OutOfMemoryError` and enhance the error message here, to give users some details about why the job failed.
One thing we can do for future safety: do not change the exception type when enhancing the error message.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21424
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21424: [SPARK-24379] BroadcastExchangeExec should catch ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/21424#discussion_r190577287
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala ---
@@ -115,9 +116,9 @@ case class BroadcastExchangeExec(
// SPARK-24294: To bypass scala bug: https://github.com/scala/bug/issues/9554, we throw
// SparkFatalException, which is a subclass of Exception. ThreadUtils.awaitResult
// will catch this exception and re-throw the wrapped fatal throwable.
- case oe: OutOfMemoryError =>
+ case oe: SparkOutOfMemoryError =>
throw new SparkFatalException(
- new OutOfMemoryError(s"Not enough memory to build and broadcast the table to " +
+ new SparkOutOfMemoryError(s"Not enough memory to build and broadcast the table to " +
--- End diff --
since we fully control the creation of `SparkOutOfMemoryError`, can we move the error message to where we throw `SparkOutOfMemoryError` when building hash relation?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org