You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by JoshRosen <gi...@git.apache.org> on 2016/06/02 22:21:03 UTC

[GitHub] spark pull request #13479: [SPARK-15736][CORE][branch-1.6] Gracefully handle...

GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/13479

    [SPARK-15736][CORE][branch-1.6] Gracefully handle loss of DiskStore files

    If an RDD partition is cached on disk and the DiskStore file is lost, then reads of that cached partition will fail and the missing partition is supposed to be recomputed by a new task attempt. In the current BlockManager implementation, however, the missing file does not trigger any metadata updates / does not invalidate the cache, so subsequent task attempts will be scheduled on the same executor and the doomed read will be repeatedly retried, leading to repeated task failures and eventually a total job failure.
    
    In order to fix this problem, the executor with the missing file needs to properly mark the corresponding block as missing so that it stops advertising itself as a cache location for that block.
    
    This patch fixes this bug and adds an end-to-end regression test (in `FailureSuite`) and a set of unit tests (`in BlockManagerSuite`).
    
    This is a branch-1.6 backport of #13473.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark handle-missing-cache-files-branch-1.6

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13479.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13479
    
----
commit fa40a80d4c3dc271943f2a8c88bb200dbca93f6c
Author: Josh Rosen <jo...@databricks.com>
Date:   2016-06-02T20:30:42Z

    Add failing regression test.

commit 36536d7af43148dbceae4834be6f9eee71afee40
Author: Josh Rosen <jo...@databricks.com>
Date:   2016-06-02T21:08:43Z

    Add failing unit tests in BlockManagerSuite.

commit 8f047202a15a373c92724d99ec2e4ab2d7d30a07
Author: Josh Rosen <jo...@databricks.com>
Date:   2016-06-02T22:15:12Z

    Fix bug.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13479: [SPARK-15736][CORE][branch-1.6] Gracefully handle loss o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13479
  
    **[Test build #59889 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59889/consoleFull)** for PR 13479 at commit [`8f04720`](https://github.com/apache/spark/commit/8f047202a15a373c92724d99ec2e4ab2d7d30a07).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13479: [SPARK-15736][CORE][branch-1.6] Gracefully handle...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen closed the pull request at:

    https://github.com/apache/spark/pull/13479


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13479: [SPARK-15736][CORE][branch-1.6] Gracefully handle loss o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13479
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13479: [SPARK-15736][CORE][branch-1.6] Gracefully handle loss o...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the issue:

    https://github.com/apache/spark/pull/13479
  
    Merging into 1.6.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13479: [SPARK-15736][CORE][branch-1.6] Gracefully handle loss o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13479
  
    **[Test build #59889 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59889/consoleFull)** for PR 13479 at commit [`8f04720`](https://github.com/apache/spark/commit/8f047202a15a373c92724d99ec2e4ab2d7d30a07).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13479: [SPARK-15736][CORE][branch-1.6] Gracefully handle loss o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13479
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59889/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13479: [SPARK-15736][CORE][branch-1.6] Gracefully handle loss o...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the issue:

    https://github.com/apache/spark/pull/13479
  
    can you delete branch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13479: [SPARK-15736][CORE][branch-1.6] Gracefully handle loss o...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the issue:

    https://github.com/apache/spark/pull/13479
  
    /cc @andrewor14, this is the branch-1.6 backport of my other patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org