You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@tez.apache.org by "Ming Ma (JIRA)" <ji...@apache.org> on 2016/08/09 16:55:20 UTC

[jira] [Commented] (TEZ-3397) Better fault tolerance heuristics for custom edge

    [ https://issues.apache.org/jira/browse/TEZ-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15413834#comment-15413834 ] 

Ming Ma commented on TEZ-3397:
------------------------------

Is that only for specific custom edge scenario or it applies to SCATTER_GATHER as well?

Wonder if this will increase the likelihood of false positive. For example, source vertex's {{TaskAttemptImpl}} has the list of destination tasks that have complained so far, some of which were due to network issue a while back; others might have succeeded since. Then the source task attempt gets another complaint from a new destination task close to the end of destination vertex completion (thus few unfinished destination tasks), this new heuristics could mark the source task bad, while the actual issue is from destination task.

Another thing is how to test such heuristics change, if it is based on some sort of simulation.

> Better fault tolerance heuristics for custom edge
> -------------------------------------------------
>
>                 Key: TEZ-3397
>                 URL: https://issues.apache.org/jira/browse/TEZ-3397
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Zhiyuan Yang
>            Assignee: Zhiyuan Yang
>
> Today, a source task calculates failure fraction by dividing number of unique destination tasks that report failure by number of destination tasks that depend on this source task. A better way is to divide number of destination tasks that report failure by number of *unfinished* destination tasks that depend on the source task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)