You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by caneGuy <gi...@git.apache.org> on 2018/06/11 09:20:57 UTC

[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...

GitHub user caneGuy opened a pull request:

    https://github.com/apache/spark/pull/21526

    [SPARK-24515][CORE] No need to warning when output commit coordination enabled

    ## What changes were proposed in this pull request?
    
    No need to warning user when output commit coordination enabled
    ```
    // When speculation is on and output committer class name contains "Direct", we should warn
    // users that they may loss data if they are using a direct output committer.
    val speculationEnabled = self.conf.getBoolean("spark.speculation", false)
    val outputCommitterClass = hadoopConf.get("mapred.output.committer.class", "")
    if (speculationEnabled && outputCommitterClass.contains("Direct")) {
     val warningMessage =
     s"$outputCommitterClass may be an output committer that writes data directly to " +
     "the final location. Because speculation is enabled, this output committer may " +
     "cause data loss (see the case in SPARK-10063). If possible, please use an output " +
     "committer that does not have this behavior (e.g. FileOutputCommitter)."
     logWarning(warningMessage)
    }
    ```
    
    ## How was this patch tested?
    UT


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/caneGuy/spark zhoukang/fix-warning

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21526.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21526
    
----
commit 6bac1531929e914764d980e4eb4228a10436876b
Author: zhoukang <zh...@...>
Date:   2018-06-11T08:57:11Z

    [SPARK][CORE] No need to warning when output commit coordination enabled

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21526
  
    **[Test build #92923 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92923/testReport)** for PR 21526 at commit [`ee450f5`](https://github.com/apache/spark/commit/ee450f517b3df5c61ed6cce5513ec07fc898590b).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21526
  
    **[Test build #92929 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92929/testReport)** for PR 21526 at commit [`63a62db`](https://github.com/apache/spark/commit/63a62dbf984aa760e48a16c014cfcf4a91fcfd7e).
     * This patch **fails to generate documentation**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21526
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21526
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92931/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21526
  
    **[Test build #92923 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92923/testReport)** for PR 21526 at commit [`ee450f5`](https://github.com/apache/spark/commit/ee450f517b3df5c61ed6cce5513ec07fc898590b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21526
  
    **[Test build #92929 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92929/testReport)** for PR 21526 at commit [`63a62db`](https://github.com/apache/spark/commit/63a62dbf984aa760e48a16c014cfcf4a91fcfd7e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21526
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92923/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21526
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...

Posted by caneGuy <gi...@git.apache.org>.
Github user caneGuy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21526#discussion_r202006002
  
    --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala ---
    @@ -357,6 +357,11 @@ package object config {
           .intConf
           .createWithDefault(256)
     
    +  private[spark] val HADOOP_OUTPUTCOMMITCOORDINATION_ENABLED =
    +    ConfigBuilder("spark.hadoop.outputCommitCoordination.enabled")
    +      .booleanConf
    --- End diff --
    
    Done @cloud-fan Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21526
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21526
  
    Hmm, I think the problem of using direct committer is not about speculative tasks, but about not being able to rollback an already committed task. So the output coordinator can't help here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21526
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...

Posted by caneGuy <gi...@git.apache.org>.
Github user caneGuy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21526#discussion_r202005902
  
    --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala ---
    @@ -357,6 +357,11 @@ package object config {
           .intConf
           .createWithDefault(256)
     
    +  private[spark] val HADOOP_OUTPUTCOMMITCOORDINATION_ENABLED =
    +    ConfigBuilder("spark.hadoop.outputCommitCoordination.enabled")
    +      .booleanConf
    --- End diff --
    
    Done @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...

Posted by caneGuy <gi...@git.apache.org>.
Github user caneGuy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21526#discussion_r201962635
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala ---
    @@ -1053,7 +1053,10 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
         // users that they may loss data if they are using a direct output committer.
         val speculationEnabled = self.conf.getBoolean("spark.speculation", false)
         val outputCommitterClass = hadoopConf.get("mapred.output.committer.class", "")
    -    if (speculationEnabled && outputCommitterClass.contains("Direct")) {
    +    val outputCommitCoordinationEnabled = self.conf.getBoolean(
    +      "spark.hadoop.outputCommitCoordination.enabled", true)
    --- End diff --
    
    Update @cloud-fan Thanks


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21526
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92929/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21526#discussion_r201720609
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala ---
    @@ -1053,7 +1053,10 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
         // users that they may loss data if they are using a direct output committer.
         val speculationEnabled = self.conf.getBoolean("spark.speculation", false)
         val outputCommitterClass = hadoopConf.get("mapred.output.committer.class", "")
    -    if (speculationEnabled && outputCommitterClass.contains("Direct")) {
    +    val outputCommitCoordinationEnabled = self.conf.getBoolean(
    +      "spark.hadoop.outputCommitCoordination.enabled", true)
    +    if (speculationEnabled && outputCommitterClass.contains("Direct")
    +      && !outputCommitCoordinationEnabled) {
           val warningMessage =
    --- End diff --
    
    Yea, I found a similar message in `HiveFileFormat`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...

Posted by caneGuy <gi...@git.apache.org>.
Github user caneGuy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21526#discussion_r201963160
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala ---
    @@ -1053,7 +1053,10 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
         // users that they may loss data if they are using a direct output committer.
         val speculationEnabled = self.conf.getBoolean("spark.speculation", false)
         val outputCommitterClass = hadoopConf.get("mapred.output.committer.class", "")
    -    if (speculationEnabled && outputCommitterClass.contains("Direct")) {
    +    val outputCommitCoordinationEnabled = self.conf.getBoolean(
    +      "spark.hadoop.outputCommitCoordination.enabled", true)
    +    if (speculationEnabled && outputCommitterClass.contains("Direct")
    +      && !outputCommitCoordinationEnabled) {
           val warningMessage =
    --- End diff --
    
    Also modify `HiveFileFormat`. @cloud-fan @jiangxb1987 
    And the reason i do not use an other common function to refactor this is that i can't find a good place to put the function.Any suggestion?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21526
  
    **[Test build #92931 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92931/testReport)** for PR 21526 at commit [`c233c72`](https://github.com/apache/spark/commit/c233c725110d66fc712a96a253684cdb02b19e23).
     * This patch **fails to generate documentation**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21526#discussion_r202005442
  
    --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala ---
    @@ -357,6 +357,11 @@ package object config {
           .intConf
           .createWithDefault(256)
     
    +  private[spark] val HADOOP_OUTPUTCOMMITCOORDINATION_ENABLED =
    +    ConfigBuilder("spark.hadoop.outputCommitCoordination.enabled")
    +      .booleanConf
    --- End diff --
    
    .doc("when enabled, tasks will coordinate with the driver to make sure that, for a certain partition, at most one task attempt can commit.")


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21526
  
    **[Test build #92931 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92931/testReport)** for PR 21526 at commit [`c233c72`](https://github.com/apache/spark/commit/c233c725110d66fc712a96a253684cdb02b19e23).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21526
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21526#discussion_r201717626
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala ---
    @@ -1053,7 +1053,10 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
         // users that they may loss data if they are using a direct output committer.
         val speculationEnabled = self.conf.getBoolean("spark.speculation", false)
         val outputCommitterClass = hadoopConf.get("mapred.output.committer.class", "")
    -    if (speculationEnabled && outputCommitterClass.contains("Direct")) {
    +    val outputCommitCoordinationEnabled = self.conf.getBoolean(
    +      "spark.hadoop.outputCommitCoordination.enabled", true)
    --- End diff --
    
    since we are touching it, can we define this config in `org.apache.spark.internal.config`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21526
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...

Posted by caneGuy <gi...@git.apache.org>.
Github user caneGuy closed the pull request at:

    https://github.com/apache/spark/pull/21526


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21526
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91656/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21526
  
    **[Test build #91656 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91656/testReport)** for PR 21526 at commit [`6bac153`](https://github.com/apache/spark/commit/6bac1531929e914764d980e4eb4228a10436876b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21526#discussion_r201717982
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala ---
    @@ -1053,7 +1053,10 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
         // users that they may loss data if they are using a direct output committer.
         val speculationEnabled = self.conf.getBoolean("spark.speculation", false)
         val outputCommitterClass = hadoopConf.get("mapred.output.committer.class", "")
    -    if (speculationEnabled && outputCommitterClass.contains("Direct")) {
    +    val outputCommitCoordinationEnabled = self.conf.getBoolean(
    +      "spark.hadoop.outputCommitCoordination.enabled", true)
    +    if (speculationEnabled && outputCommitterClass.contains("Direct")
    +      && !outputCommitCoordinationEnabled) {
           val warningMessage =
    --- End diff --
    
    is this the only place? IIRC we log a warning for this in several places.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21526: [SPARK-24515][CORE] No need to warning when output commi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21526
  
    **[Test build #91656 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91656/testReport)** for PR 21526 at commit [`6bac153`](https://github.com/apache/spark/commit/6bac1531929e914764d980e4eb4228a10436876b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org