You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2018/11/01 16:05:53 UTC

[GitHub] spark pull request #22923: [SPARK-25910][CORE] accumulator updates from prev...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/22923

    [SPARK-25910][CORE] accumulator updates from previous stage attempt should not log error

    ## What changes were proposed in this pull request?
    
    For shuffle map stages, we may have multiple attempts, while only the latest attempt is active. However, the scheduler still accepts successful tasks from previous attempts, to speed up the execution.
    
    Each stage attempt has a `StageInfo` instance, which contains `TaskMetrics`. `TaskMetrics` has a bunch of accumulators to track the metrics like CPU time, etc. However, a stage only keeps the `StageInfo` of the latest attempt, which means the `StageInfo` of previous attempts will be GCed, and their accumulators of `TaskMetrics` will be cleaned.
    
    This causes a problem: When the scheduler accepts a successful task from a previous attempt, and tries to update accumulators, we may fail to get the accumulators from `AccumulatorContext`, as they are already cleaned. And we may hit error log like
    ```
    18/10/21 15:30:24 INFO ContextCleaner: Cleaned accumulator 2868 (name: internal.metrics.executorDeserializeTime)
    18/10/21 15:30:24 ERROR DAGScheduler: Failed to update accumulators for task 7927
    org.apache.spark.SparkException: attempted to access non-existent accumulator 2868
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1267)
    ...
    ```
    
    This PR proposes a simple fix: When the scheduler receives successful tasks from previous attempts, don't update accumulators. Accumulators of previous stage attemps are not tracked anymore, so we don't need to update them.
    
    
    ## How was this patch tested?
    
    a new test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark late-task

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22923.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22923
    
----
commit 07f900cf845662186f8d1daea3be9abe2633d5c0
Author: Wenchen Fan <we...@...>
Date:   2018-11-01T15:40:14Z

    accumulator updates from previous stage attempt

commit 4d9cbe043604e76b6367e4ecb42d0d36437d1792
Author: Wenchen Fan <we...@...>
Date:   2018-11-01T16:04:41Z

    different fix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22923: [SPARK-25910][CORE] accumulator updates from previous st...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22923
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22923: [SPARK-25910][CORE] accumulator updates from previous st...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22923
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22923: [SPARK-25910][CORE] accumulator updates from previous st...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22923
  
    **[Test build #98355 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98355/testReport)** for PR 22923 at commit [`4d9cbe0`](https://github.com/apache/spark/commit/4d9cbe043604e76b6367e4ecb42d0d36437d1792).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22923: [SPARK-25910][CORE] accumulator updates from previous st...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22923
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22923: [SPARK-25910][CORE] accumulator updates from previous st...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22923
  
    **[Test build #98355 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98355/testReport)** for PR 22923 at commit [`4d9cbe0`](https://github.com/apache/spark/commit/4d9cbe043604e76b6367e4ecb42d0d36437d1792).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22923: [SPARK-25910][CORE] accumulator updates from previous st...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22923
  
    > We need to always update user accumulators
    
    Ah that's a good point.
    
    I'm going to close it. I missed one thing: the `AppStatusListener` will keep the `StageInfo` instance until all tasks of that stage attempt is finished. See https://github.com/apache/spark/pull/22209
    
    So this bug should already have been fixed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22923: [SPARK-25910][CORE] accumulator updates from previous st...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22923
  
    **[Test build #98416 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98416/testReport)** for PR 22923 at commit [`4d9cbe0`](https://github.com/apache/spark/commit/4d9cbe043604e76b6367e4ecb42d0d36437d1792).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22923: [SPARK-25910][CORE] accumulator updates from previous st...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22923
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22923: [SPARK-25910][CORE] accumulator updates from previous st...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22923
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98355/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22923: [SPARK-25910][CORE] accumulator updates from previous st...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22923
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22923: [SPARK-25910][CORE] accumulator updates from previous st...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22923
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4695/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22923: [SPARK-25910][CORE] accumulator updates from previous st...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22923
  
    **[Test build #98416 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98416/testReport)** for PR 22923 at commit [`4d9cbe0`](https://github.com/apache/spark/commit/4d9cbe043604e76b6367e4ecb42d0d36437d1792).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22923: [SPARK-25910][CORE] accumulator updates from prev...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan closed the pull request at:

    https://github.com/apache/spark/pull/22923


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22923: [SPARK-25910][CORE] accumulator updates from previous st...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/22923
  
    We need to always update user accumulators. Right now such task metrics just cause some annoying error logs, seems not worth to fix.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22923: [SPARK-25910][CORE] accumulator updates from previous st...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22923
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4731/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22923: [SPARK-25910][CORE] accumulator updates from previous st...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22923
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98416/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22923: [SPARK-25910][CORE] accumulator updates from previous st...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22923
  
    cc @vanzin @zsxwing @jiangxb1987 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org