You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by caneGuy <gi...@git.apache.org> on 2018/06/11 09:20:57 UTC
[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...
GitHub user caneGuy opened a pull request:
https://github.com/apache/spark/pull/21526
[SPARK-24515][CORE] No need to warning when output commit coordination enabled
## What changes were proposed in this pull request?
No need to warning user when output commit coordination enabled
```
// When speculation is on and output committer class name contains "Direct", we should warn
// users that they may loss data if they are using a direct output committer.
val speculationEnabled = self.conf.getBoolean("spark.speculation", false)
val outputCommitterClass = hadoopConf.get("mapred.output.committer.class", "")
if (speculationEnabled && outputCommitterClass.contains("Direct")) {
val warningMessage =
s"$outputCommitterClass may be an output committer that writes data directly to " +
"the final location. Because speculation is enabled, this output committer may " +
"cause data loss (see the case in SPARK-10063). If possible, please use an output " +
"committer that does not have this behavior (e.g. FileOutputCommitter)."
logWarning(warningMessage)
}
```
## How was this patch tested?
UT
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/caneGuy/spark zhoukang/fix-warning
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21526.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21526
----
commit 6bac1531929e914764d980e4eb4228a10436876b
Author: zhoukang <zh...@...>
Date: 2018-06-11T08:57:11Z
[SPARK][CORE] No need to warning when output commit coordination enabled
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21526
**[Test build #92923 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92923/testReport)** for PR 21526 at commit [`ee450f5`](https://github.com/apache/spark/commit/ee450f517b3df5c61ed6cce5513ec07fc898590b).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21526
**[Test build #92929 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92929/testReport)** for PR 21526 at commit [`63a62db`](https://github.com/apache/spark/commit/63a62dbf984aa760e48a16c014cfcf4a91fcfd7e).
* This patch **fails to generate documentation**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21526
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21526
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92931/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21526
**[Test build #92923 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92923/testReport)** for PR 21526 at commit [`ee450f5`](https://github.com/apache/spark/commit/ee450f517b3df5c61ed6cce5513ec07fc898590b).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21526
**[Test build #92929 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92929/testReport)** for PR 21526 at commit [`63a62db`](https://github.com/apache/spark/commit/63a62dbf984aa760e48a16c014cfcf4a91fcfd7e).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21526
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92923/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21526
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...
Posted by caneGuy <gi...@git.apache.org>.
Github user caneGuy commented on a diff in the pull request:
https://github.com/apache/spark/pull/21526#discussion_r202006002
--- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -357,6 +357,11 @@ package object config {
.intConf
.createWithDefault(256)
+ private[spark] val HADOOP_OUTPUTCOMMITCOORDINATION_ENABLED =
+ ConfigBuilder("spark.hadoop.outputCommitCoordination.enabled")
+ .booleanConf
--- End diff --
Done @cloud-fan Thanks!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21526
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/21526
Hmm, I think the problem of using direct committer is not about speculative tasks, but about not being able to rollback an already committed task. So the output coordinator can't help here.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21526
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...
Posted by caneGuy <gi...@git.apache.org>.
Github user caneGuy commented on a diff in the pull request:
https://github.com/apache/spark/pull/21526#discussion_r202005902
--- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -357,6 +357,11 @@ package object config {
.intConf
.createWithDefault(256)
+ private[spark] val HADOOP_OUTPUTCOMMITCOORDINATION_ENABLED =
+ ConfigBuilder("spark.hadoop.outputCommitCoordination.enabled")
+ .booleanConf
--- End diff --
Done @cloud-fan
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...
Posted by caneGuy <gi...@git.apache.org>.
Github user caneGuy commented on a diff in the pull request:
https://github.com/apache/spark/pull/21526#discussion_r201962635
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala ---
@@ -1053,7 +1053,10 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
// users that they may loss data if they are using a direct output committer.
val speculationEnabled = self.conf.getBoolean("spark.speculation", false)
val outputCommitterClass = hadoopConf.get("mapred.output.committer.class", "")
- if (speculationEnabled && outputCommitterClass.contains("Direct")) {
+ val outputCommitCoordinationEnabled = self.conf.getBoolean(
+ "spark.hadoop.outputCommitCoordination.enabled", true)
--- End diff --
Update @cloud-fan Thanks
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21526
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92929/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...
Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21526#discussion_r201720609
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala ---
@@ -1053,7 +1053,10 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
// users that they may loss data if they are using a direct output committer.
val speculationEnabled = self.conf.getBoolean("spark.speculation", false)
val outputCommitterClass = hadoopConf.get("mapred.output.committer.class", "")
- if (speculationEnabled && outputCommitterClass.contains("Direct")) {
+ val outputCommitCoordinationEnabled = self.conf.getBoolean(
+ "spark.hadoop.outputCommitCoordination.enabled", true)
+ if (speculationEnabled && outputCommitterClass.contains("Direct")
+ && !outputCommitCoordinationEnabled) {
val warningMessage =
--- End diff --
Yea, I found a similar message in `HiveFileFormat`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...
Posted by caneGuy <gi...@git.apache.org>.
Github user caneGuy commented on a diff in the pull request:
https://github.com/apache/spark/pull/21526#discussion_r201963160
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala ---
@@ -1053,7 +1053,10 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
// users that they may loss data if they are using a direct output committer.
val speculationEnabled = self.conf.getBoolean("spark.speculation", false)
val outputCommitterClass = hadoopConf.get("mapred.output.committer.class", "")
- if (speculationEnabled && outputCommitterClass.contains("Direct")) {
+ val outputCommitCoordinationEnabled = self.conf.getBoolean(
+ "spark.hadoop.outputCommitCoordination.enabled", true)
+ if (speculationEnabled && outputCommitterClass.contains("Direct")
+ && !outputCommitCoordinationEnabled) {
val warningMessage =
--- End diff --
Also modify `HiveFileFormat`. @cloud-fan @jiangxb1987
And the reason i do not use an other common function to refactor this is that i can't find a good place to put the function.Any suggestion?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21526
**[Test build #92931 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92931/testReport)** for PR 21526 at commit [`c233c72`](https://github.com/apache/spark/commit/c233c725110d66fc712a96a253684cdb02b19e23).
* This patch **fails to generate documentation**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/21526#discussion_r202005442
--- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -357,6 +357,11 @@ package object config {
.intConf
.createWithDefault(256)
+ private[spark] val HADOOP_OUTPUTCOMMITCOORDINATION_ENABLED =
+ ConfigBuilder("spark.hadoop.outputCommitCoordination.enabled")
+ .booleanConf
--- End diff --
.doc("when enabled, tasks will coordinate with the driver to make sure that, for a certain partition, at most one task attempt can commit.")
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21526
**[Test build #92931 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92931/testReport)** for PR 21526 at commit [`c233c72`](https://github.com/apache/spark/commit/c233c725110d66fc712a96a253684cdb02b19e23).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21526
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/21526#discussion_r201717626
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala ---
@@ -1053,7 +1053,10 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
// users that they may loss data if they are using a direct output committer.
val speculationEnabled = self.conf.getBoolean("spark.speculation", false)
val outputCommitterClass = hadoopConf.get("mapred.output.committer.class", "")
- if (speculationEnabled && outputCommitterClass.contains("Direct")) {
+ val outputCommitCoordinationEnabled = self.conf.getBoolean(
+ "spark.hadoop.outputCommitCoordination.enabled", true)
--- End diff --
since we are touching it, can we define this config in `org.apache.spark.internal.config`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21526
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...
Posted by caneGuy <gi...@git.apache.org>.
Github user caneGuy closed the pull request at:
https://github.com/apache/spark/pull/21526
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21526
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91656/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21526
**[Test build #91656 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91656/testReport)** for PR 21526 at commit [`6bac153`](https://github.com/apache/spark/commit/6bac1531929e914764d980e4eb4228a10436876b).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/21526#discussion_r201717982
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala ---
@@ -1053,7 +1053,10 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
// users that they may loss data if they are using a direct output committer.
val speculationEnabled = self.conf.getBoolean("spark.speculation", false)
val outputCommitterClass = hadoopConf.get("mapred.output.committer.class", "")
- if (speculationEnabled && outputCommitterClass.contains("Direct")) {
+ val outputCommitCoordinationEnabled = self.conf.getBoolean(
+ "spark.hadoop.outputCommitCoordination.enabled", true)
+ if (speculationEnabled && outputCommitterClass.contains("Direct")
+ && !outputCommitCoordinationEnabled) {
val warningMessage =
--- End diff --
is this the only place? IIRC we log a warning for this in several places.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21526
**[Test build #91656 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91656/testReport)** for PR 21526 at commit [`6bac153`](https://github.com/apache/spark/commit/6bac1531929e914764d980e4eb4228a10436876b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org