You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2015/11/10 21:11:10 UTC

[jira] [Updated] (TEZ-808) Handle task attempts that are not making progress

     [ https://issues.apache.org/jira/browse/TEZ-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated TEZ-808:
---------------------------
    Attachment: TEZ-808.branch-0.7.patch

Would it be possible to backport this to branch-0.7?  We're going to be on 0.7 for a while, and we'd like this fix (along with TEZ-2918) to be able to catch hung tasks in production and automatically recover. 

Attaching a version of the patch for branch-0.7.  It came over fairly cleanly.

> Handle task attempts that are not making progress
> -------------------------------------------------
>
>                 Key: TEZ-808
>                 URL: https://issues.apache.org/jira/browse/TEZ-808
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>             Fix For: 0.8.2
>
>         Attachments: TEZ-808.1.patch, TEZ-808.2.patch, TEZ-808.3.patch, TEZ-808.branch-0.7.patch
>
>
> If a task attempt is not making progress then it may cause the job to hang. We may want to kill and restart the attempt. With speculation support and free resources we may want to run another version in parallel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)